AI / MLOps Engineer (Infrastructure, Monitoring & Deployment)

apartmentFABC LLC placeAbu Dhabi calendar_month22/05/2025

Job Description

SourceURL:file:///home/user/Downloads/AI Hiring post (1).docx

Summary

We are seeking a highly skilled AI / MLOps Engineer to build, deploy, monitor, and manage large-scale AI infrastructure based on HGX H200 nodes. You will play a central role in deploying LLMs, fine-tuning models, automating CI/CD workflows, monitoring model behavior, and maintaining uptime.

This role spans infrastructure setup, orchestration, model serving, and operational reliability, and will closely support all aspects of AI model lifecycle in a production environment.

Key Responsibilities

Operate andmanage Kubernetes or OpenShift clusters for multi-node orchestration

Deploy and manage LLMs and other AI models for inference using Triton Inference Server or custom endpoints

Automate CI/CD pipelines for model packaging, serving, retraining, and rollback using GitLab CI or ArgoCD

Set up model and infrastructure monitoring systems (Prometheus, Grafana, NVIDIA DCGM)

Implement model drift detection, performance alerting, and inference logging

Manage model checkpoints, reproducibility controls, and rollback strategies

Track deployed model versions using MLFlow or equivalent registry tools

Implement secure access controls for model endpoints and data artifacts

Collaborate with AI / Data Engineer to integrate and deploy fine-tuned datasets

Ensure high availability, performance, and observability of all AI services in production

Required Qualifications

3+ years experience in DevOps, MLOps, or AI/ML infrastructure roles

10+ overall experience with solution operations

Proven experience with Kubernetes or OpenShift in production environments, preferably certified.

Familiarity with deploying and scaling PyTorch or TensorFlow models for inference

Experience with CI/CD automation tools with Open Shift / Kubernetes

Hands-on experience with model registry systems (e.g., MLFlow, KubeFlow)

Experience with monitoring tools (e.g., Prometheus, Grafana) and GPU workload optimization

Strong scripting skills (Python, Bash) and Linux system administration knowledge

Preferred (Bonus) Skills

Experience with Triton Inference Server or NVIDIA AI stack

business_centerHigh salary

AI Platform Architect (Onsite) - Abu Dhabi, UAE

apartmentNST Recruitment LimitedplaceAbu Dhabi

Job Description AI Platform Architect AI, Platform Architecture, Automation, Virtualisation, Kubernetes (K8S), Red Hat, Pre-Sales, Solution Architecture, Documentation, Abu Dhabi (UAE) Very Attractive Salary + Excellent Bonus + Benefits...

check_circleNew offer

Intern | Cyber - AI - Abu Dhabi - ref. u23999515

apartmentDeloitte & Touche (M.E.)placeAbu Dhabi

to make a positive, enduring impact. During your tenure as an AI / Emerging Technologies intern, you will demonstrate and develop your capabilities in the following areas: • Collaborate with consultants to design and implement security controls...

starFeatured

Account Director- AI & Technology Communication Specialist (8months contract)

apartmentEdelmanplaceAbu Dhabi

and represents the world in which we live, and fosters trust, collaboration and belonging. About The Role Edelman is seeking two experienced communication specialists with deep expertise in AI and emerging technologies for an 8-month secondment to a high...