Senior AI Systems Engineer - Conversational AI Platform

apartmentThe Future of Voice placeDubai calendar_month 

Job Description

Company Overview

Xpress Innovations is building a cutting-edge conversational AI voice platform - an upgraded version of Zo, our flagship AI platform to provide Natural human like AI customer support, training in multiple languages. We are seeking a highly skilled Senior AI Systems Engineer to lead the deployment, configuration, and optimization of an on-premises AI infrastructure powered by an NVIDIA servers with multiple high performance GPUs.

This role will be pivotal in delivering a scalable, low-latency voice platform using open-source AI tools.

Job Summary

The Senior AI Systems Engineer will be responsible for installing, configuring, and optimizing a conversational AI voice platform on an NVIDIA servers running Ubuntu Server 24.04 LTS. The platform leverages open-source tools, including OpenAI Whisper (Speech-to-Text), NVIDIA NeMo (Text-to-Speech), Hugging Face Transformers (Natural Language Understanding), and Llama 3.1 (Large Language Model), fine-tuned to our domain-specific knowledgebase.

The candidate will ensure high performance (sub-100ms latency, thousands of concurrent users), GPU optimization (CUDA, TensorRT, NVLink), and seamless integration into a production environment.

Key Responsibilities

Infrastructure Setup:

oInstall and configure Ubuntu Server 24.04 LTS on an NVIDIA servers, ensuring compatibility with NVLink and SXM5 form factor.

oSet up NVIDIA drivers, CUDA (12.x), cuDNN, and NVIDIA Container Toolkit for GPU-accelerated workloads.

oConfigure high-speed networking (e.g., NVIDIA Quantum-X800 InfiniBand) and NVMe storage for optimal data access.

AI Tool Deployment:

oDeploy OpenAI Whisper (large-v3) for real-time Speech-to-Text (STT) with multilingual support.

oImplement NVIDIA NeMo FastPitch + HiFi-GAN for high-quality, customizable Text-to-Speech (TTS).

oConfigure Hugging Face Transformers (e.g., DistilBERT, RoBERTa) for Natural Language Understanding (NLU) tasks like intent classification and entity recognition.

oDeploy Llama 3.1 (70B) or Mixtral 8x7B for dialogue generation, integrating Retrieval-Augmented Generation (RAG) with a vector database (e.g., FAISS).

Model Optimization:

oOptimize models for H200 GPUs using NVIDIA TensorRT-LLM and vLLM to achieve sub-100ms latency for STT, TTS, NLU, and LLM inference.

oLeverage 900 GB/s NVLink bandwidth for multi-GPU parallelism (e.g., tensor parallelism for Llama 3.1 70B).

oFine-tune models on our knowledge base (e.g., 1020 hours of audio, 10,000 text samples) using NeMo, Hugging Face, and LoRA for efficient training.

Pipeline Integration:

oBuild a real-time conversational AI pipeline using WebSocket or gRPC for audio/text exchange, integrating STT, NLU, LLM, and TTS components.

oUse FastAPI and Python libraries (e.g., aiortc) for low-latency communication.

oContainerize the stack with Docker and NVIDIA Container Toolkit for scalability and reproducibility.

Performance and Scalability:

oAchieve performance targets:

oScale the platform across four H200 GPUs, optimizing batching and resource allocation.

oImplement monitoring with Prometheus and Grafana to track GPU utilization, latency, and throughput.

Fine-Tuning and Knowledge Base Integration:

oCollect and preprocess domain-specific data (audio for STT/TTS, text for NLU/LLM) to fine-tune models.

oImplement RAG with FAISS or similar to dynamically query our knowledge base during LLM inference.

oValidate model accuracy (e.g., >90% STT transcription accuracy, >90% NLU intent accuracy) on domain-specific benchmarks.

Maintenance and Documentation:

oMaintain the platform, applying updates to OS, drivers, and AI tools while ensuring stability.

oDocument installation, configuration, and optimization processes for internal teams.

oTrain team members on platform usage and troubleshooting.

Qualifications

Education: Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field. PhD or equivalent experience in AI/ML is a plus.

Experience:

o5+ years in AI systems engineering, with 3+ years deploying GPU-accelerated AI workloads on NVIDIA hardware.

oProven experience with NVIDIA HGX/DGX servers, CUDA, TensorRT, and NVLink.

oHands-on experience with open-source AI tools: OpenAI Whisper, NVIDIA NeMo, Hugging Face Transformers, and LLMs (e.g., Llama, Mistral).

oExpertise in fine-tuning AI models (STT, TTS, NLU, LLM) on domain-specific datasets.

oExperience building real-time AI pipelines with WebSocket, gRPC, or similar protocols.

Technical Skills:

oProficiency in Python, PyTorch, and Linux (Ubuntu Server preferred).

oExpertise in NVIDIA software stack: CUDA, cuDNN, TensorRT-LLM, vLLM, NVIDIA Container Toolkit.

oFamiliarity with Docker, Kubernetes, and orchestration for scalable AI deployments.

oKnowledge of networking (InfiniBand, Ethernet) and storage (NVMe) in HPC environments.

oExperience with monitoring tools (Prometheus, Grafana) and vector databases (FAISS, Pinecone).

Soft Skills:

oStrong problem-solving and debugging skills for complex AI systems.

oAbility to work independently and collaborate with cross-functional teams (e.g., data scientists, developers).

oExcellent communication skills to document processes and train team members.

Preferred Qualifications

Experience with conversational AI platforms (e.g., call center, virtual assistants).

Familiarity with multilingual AI models and low-latency audio processing.

Knowledge of RAG and knowledge base integration for LLMs.

Contributions to open-source AI projects or NVIDIA NGC community.

Compensation and Benefits
  • Salary: Excellent tax-free salary as commensurate to the experience in this position.
  • Benefits: Health insurance full coverage plus professional development budget.
  • Location: On-site in Dubai/Abu Dhabi.

Why Join Us

Be at the forefront of conversational AI, deploying a state-of-the-art voice platform on cutting-edge systems. Work with a passionate team to deliver, multilingual seamless customer experiences globally. Your expertise will shape a scalable, high-impact AI solution tailored to our unique needs.

check_circleNew offer

Systems Engineer - Telco (UAE)

apartmentFortinetplaceDubai
Job Description Job Description Systems Engineer Telco Location Dubai, UAE The Systems Engineer for Telco MSP Solutions & Multi-Cloud Security is responsible for establishing relationships with Telco/MSP & Large-Scale Enterprises on behalf...
local_fire_departmentUrgent

Electrical Engineer - Dubai

apartmentSembol ConstructionplaceDubai
Job description / Role Employment: Full Time We are seeking a skilled Electrical Engineer to join our dynamic construction team in the UAE. The ideal candidate will be responsible for designing, developing, and overseeing electrical systems...
electric_boltImmediate start

Planning Engineer (Rail Systems & MEP)

apartmentParsons CorporationplaceDubai
people sharing a common quest. Imagine a workplace where you can be yourself. Where you can thrive. Where you can find your next, right now. We've got what you're looking for. Job Description Planning Engineer (Rail Systems & MEP) Dubai, UAE Parsons...