Reliability Manager

apartmentKhazna Data Centers placeDubai calendar_month 

Job Description

Khazna was founded in 2012 and has grown rapidly into becoming the leading and trusted wholesale Data Center provider in the Middle East and North Africa region. Through our Data Centers, we provide industry benchmark levels of power supply and cooling services to better serve the growing need for data center operations in the UAE and wider region.

We are looking for a Reliability Manager which is a centralized role within Data Center Engineering & Asset Management, reporting to the Executive Director, DC Engineering & Asset Management. This role drives strategic reliability initiatives across all data centers, enhancing the resilience of critical infrastructure through standardized best practices, predictive maintenance, and data-driven decision-making.

The Reliability Manager is responsible for identifying potential failure modes, implementing proactive risk mitigation strategies and optimizing asset performance to improve data center efficiency, uptime, and cost-effectiveness.

Key Accountabilities:

  • Develop and implement data center reliability engineering strategies to enhance infrastructure performance and minimize failures.
  • Conduct Root Cause Analysis (RCA) and Failure Mode Effects Analysis (FMEA) for critical data center systems.
  • Monitor and analyse data center system performance data to identify reliability risks and mitigate failure points.
  • Lead downtime reduction initiatives to improve overall data center availability and resilience.
  • Implement preventive and predictive maintenance programs for data center critical infrastructure.
  • Utilize data analytics, IoT and machine learning tools for predictive maintenance and failure forecasting.
  • Develop and maintain a comprehensive asset management strategy, including lifecycle planning, upgrades and decommissioning.
  • Conduct capacity runway assessments to forecast data center infrastructure needs and optimize asset utilization.
  • Define and execute availability management plans, including risk assessments, mitigation strategies and outage impact analysis.
  • Collaborate with engineering, operations and vendor teams to drive continuous improvement in data center reliability.
  • Generate reliability reports and present findings to executive leadership to support data-driven decision-making.
  • Foster a culture of reliability and continuous improvement within the data center engineering and operations teams.
  • Implement condition-based monitoring and predictive analytics to detect early signs of system degradation.
  • Perform lifecycle cost analysis for key assets to optimize maintenance strategies and investment planning.
  • Identify End-of-Life (EOL) assets and develop decommissioning plans, ensuring compliance with sustainability and operational requirements.

Minimum Qualifications:

  • Bachelor's/ master's degree in engineering (Electrical, Mechanical or Computer Science).
  • Certified Reliability Engineer (CRE) or equivalent

Job-Specific Skills (Generic/ Technical):

  • 8+ years of experience in reliability engineering, preferably in Data Center or critical infrastructure environments.
  • Proven experience in implementing RCM strategies and reliability tools.
  • Expertise in using Computer Aided Facility Management Systems (CAFM) and reliability tools.
  • Strong understanding of Root Cause Analysis (RCA) and Failure Mode and Effects Analysis (FMEA).
  • Experience with Six Sigma and Lean methodologies for process optimization.
  • Proven ability to troubleshoot technical issues and manage incident response.

Additional skills:

  • Strong analytical and problem-solving skills to identify and mitigate reliability risks.
  • Effective stakeholder management to collaborate with Operations, Engineering, and Asset Management teams.
  • Ability to drive a reliability-focused culture through training, knowledge sharing and process optimization.
  • Strong communication and reporting skills for conveying reliability insights to senior leadership.

Trainings/Certifications Preferred:

  • CRL (Certified Reliability Leader)
  • Vibration analysis level 1 certification
  • Advanced FMEA Training
  • Certified Maintenance and Reliability Professional (CMRP)
  • ISO 55000 Asset Management Training
check_circleNew offer

Assistant Procurement Manager

apartmentGrand Service StationplaceDubai
excellence in the industry. Role Description This is a full-time on-site role for an Assistant Procurement Manager at Grand Service Station in Dubai. The Assistant Procurement Manager will be responsible for supplier evaluation, contract negotiation...
business_centerHigh salary

Assistant Manager - General Ledger

apartmentApparel GroupplaceDubai
Job description / Role Employment: Full Time Key responsibility The assistant manager - general ledger will support the manager - general ledger in overseeing the financial and cost accounting functions, ensuring accurate financial reporting...
local_fire_departmentUrgent

Sales Operations Manager New

placeDubai
Our client looking for an experienced Sales Operations Manager with a background in HVAC and/or Fire Protection industries to oversee and optimize sales processes and customer relationship management. Key Responsibilities:  •  Oversee and manage day...