Senior Engineer – Network Operations

apartmentcore42 placeAbu Dhabi calendar_month 

Job Description

About Us

Core 42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs.

With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning ourselves at the forefront of AI innovation in the Middle East and beyond.

The Opportunity

We are seeking a highly skilled Senior Engineer – Network Operations to support the daily operations, optimization, and reliability of the network infrastructure underpinning our global high-performance computing (HPC) environments.

This role is responsible for ensuring the availability, security, and performance of switches, firewalls, and network fabrics supporting large-scale AI and ML workloads across geographically distributed data centers. The ideal candidate will bring strong hands-on experience with enterprise networking technologies, low-latency HPC fabrics (e.g., InfiniBand), and modern network operations and automation practices.

Your Key Responsibilities
  • Support the daily operations of HPC network infrastructure, including Layer 2/3 switches, routers, firewalls, and RDMA-based fabrics (e.g., InfiniBand, RoCE), ensuring high performance and operational stability.
  • Troubleshoot and resolve complex network issues impacting HPC workloads and services, minimizing downtime and maintaining service reliability.
  • Configure, maintain, and upgrade enterprise-grade firewalls, VPNs, ACLs, and routing protocols (e.g., BGP, OSPF) while ensuring secure and efficient network operations.
  • Provide network support and integration for HPC platforms, including Slurm, Kubernetes, and bare-metal provisioning environments.
  • Support IP address management, VLAN configuration, network segmentation, and security zoning aligned with operational and compliance standards.
  • Develop and maintain automation scripts and infrastructure-as-code solutions using tools such as Python, Ansible, and Terraform to improve operational efficiency.
  • Collaborate closely with compute, storage, security, and site reliability teams to support scalable and resilient network solutions for AI and HPC workloads.
  • Maintain network documentation, operational runbooks, configurations, and change records in accordance with ITIL and operational standards.
  • Participate in on-call rotations and support incident management, change activities, and root cause analysis (RCA) processes.
  • Contribute to continuous improvement initiatives by identifying recurring operational challenges and recommending optimization opportunities.
  • Provide technical guidance and knowledge sharing within the engineering team.
  • Ensure adherence to security policies, operational procedures, and audit requirements.
What We're Looking For
  • Bachelor's degree in Network Engineering, Computer Science, or a related technical field; or equivalent practical experience.
  • Minimum of 5+ years of experience in enterprise network operations, network engineering, or infrastructure support environments.
  • Strong hands-on experience with enterprise and data center networking technologies such as Cisco, Arista, Juniper, Mellanox, or NVIDIA Networking.
  • Solid understanding of Layer 2/3 networking concepts including TCP/IP, multicast, QoS, VLAN/VXLAN, EVPN, and routing protocols.
  • Experience configuring and managing firewalls and VPN technologies (e.g., Palo Alto, Fortinet, Cisco ASA).
  • Experience supporting high-performance, low-latency network environments within HPC, AI/ML, cloud, or large-scale enterprise infrastructures.
  • Exposure to InfiniBand or RoCE technologies in HPC environments.
  • Familiarity with Kubernetes networking concepts including CNI plugins, network policies, and service networking.
  • Experience with scripting and automation tools such as Python, Ansible, or Terraform.
  • Familiarity with Linux-based operating systems and operational troubleshooting.
  • Strong analytical, troubleshooting, and problem-solving skills.
  • Good communication and collaboration skills in cross-functional technical environments.
Preferred Skills / Qualifications
  • Experience supporting GPU-based AI infrastructure or HPC clusters.
  • Exposure to cloud networking concepts across AWS, Azure, or hybrid cloud environments.
  • Familiarity with monitoring and observability platforms.
  • Understanding of CI/CD, Git, and modern DevNet practices.
  • Relevant certifications such as CCNP, JNCIP, PCNSE, or equivalent.
  • Experience operating within large-scale, hyperscale, or regulated infrastructure environments.
apartmentterracotta outsourcingplaceAbu Dhabi
and corporate solutions across the region. We are currently hiring an experienced HR Operations & Outsourcing Team Lead to lead our outsourcing operations and drive excellence across onboarding, compliance, and client management. Position:HR Operations...
placeAbu Dhabi
Title: Senior Engineer Operations Readiness & Assurance, I&C "Belong, Connect, Grow, with KBR! The KBR team of teams delivers future-forward science, technology and engineering solutions and mission-critical services that help governments...
apartmentAldar PropertiesplaceAbu Dhabi
Job Description The VP – Customer Operations is responsible for owning and scaling the end-to-end customer operations strategy for Darna, ensuring a consistent, high-quality member experience across all physical and digital touchpoints. The role...