Data Engineer
Job Description
About AI71:
AI71 is an industry leader in artificial intelligence, delivering innovative solutions that empower developers, businesses and governments to solve complex challenges. AI71 builds secure, enterprise-ready applications powered by cutting-edge technologytailored for knowledge workers and sector-specific needs.AI71 bridges the gap between advanced AI and real-world impact. Guided by a strong commitment to research and responsibility, we create transformative solutions that drive progress and empower communities.
The Role:
As a Data Engineer in our organization, you will be instrumental in developing scalable, robust, and efficient data pipelines to support the development of enterprise-grade generative AI products and intelligent agents. You will work closely with cross-functional teams to process and structure data that fuels our AI models, ensuring seamless integration and delivery of insights to enterprise clients.
What You&aposll Do:
Data Infrastructure Development:
- Design, implement, and maintain scalable data pipelines tailored for training and fine-tuning generative AI models.
- Build and optimize data architectures for extracting, transforming, and loading (ETL/ELT) processes, specifically for high-dimensional and unstructured datasets.
Data Integration and Management:
- Integrate enterprise data from diverse sources, including APIs, cloud-based storage, and proprietary databases, ensuring compliance with security and privacy standards.
- Develop and maintain large-scale data warehouses and data lakes to support model training and analytics.
Collaboration:
- Partner with data scientists and AI researchers to understand data requirements for model development and evaluation.
- Work closely with product managers and engineers to ensure data solutions align with business and product goals.
Performance and Optimization:
- Monitor and optimize data workflows to handle large-scale, real-time data streams used by AI agents.
- Ensure data systems are scalable, secure, and cost-effective for enterprise-grade applications.
Data Quality and Compliance:
- Implement systems to ensure the quality, accuracy, and consistency of training data.
- Ensure compliance with enterprise data governance policies, privacy regulations (e.g., GDPR, CCPA), and AI ethics guidelines, based on provided guidelines.
What You&aposll Bring:
- Bachelor&aposs or Master&aposs degree in Computer Science, Data Engineering, or a related field.
- 6+ years of proven experience in building and maintaining large-scale data pipelines and architectures.
- Proficiency with programming languages such as Python and Java or Scala.
- Expertise in SQL and experience with distributed data processing tools like Apache Spark, Hadoop, or Kafka.
- Familiarity with cloud platforms (AWS, GCP, or Azure) and their data-related services (e.g., S3, BigQuery, or Azure Data Lake).
- Experience working with unstructured data such as text, images, or videos.
Preferred Qualifications:
- Knowledge of MLOps and AI lifecycle workflows is a plus.
- Experience with generative AI technologies and frameworks (e.g., Hugging Face, OpenAI API).
- Familiarity with enterprise data tools like Snowflake or Databricks.
- Understanding of natural language processing (NLP) concepts and techniques.
Why AI71:
- Mission-Driven Work: Work on cutting-edge AI applications with a talented and passionate team, solving real-world challenges in critical sectors.
- Unparalleled Opportunity: This is a chance to innovate and solve real-world challenges using AI at a company with unique access to world-leading models and resources.
- Career Growth: We offer competitive compensation, benefits, and significant career growth opportunities as a foundational member of the team.
- World-Class Environment: Enjoy a flexible working environment and the latest tools & technologies needed to do your best work.