Senior AI Infrastructure Engineer
Contract Type
Location
Industry
Specialisation
Salary
Contact Name
Contact Email
Date published
Job Reference
Description
Sydney | Permanent | Hybrid
We're partnering with a global technology business at the forefront of AI infrastructure, high-performance computing and next-generation data centre platforms.
As demand for large-scale AI workloads continues to accelerate, we're seeking a Senior AI Infrastructure Engineer to help design, build and operate the critical infrastructure that powers AI training, inference and GPU-intensive environments.
This is a rare opportunity to work on genuinely cutting-edge infrastructure challenges, supporting large-scale compute environments that sit at the heart of modern AI platforms.
As a Senior AI Infrastructure Engineer, you'll play a key role in designing and operating highly scalable infrastructure platforms supporting AI, HPC and GPU workloads. You'll work alongside infrastructure, platform, networking and software engineering teams to deliver resilient, automated and high-performing environments.
This role would suit someone from a Platform Engineering, DevOps, SRE, Linux Engineering, HPC or GPU Infrastructure background who enjoys solving complex technical challenges at scale.
- Design, deploy and support AI and GPU infrastructure platforms.
- Build scalable, automated infrastructure using Infrastructure as Code principles.
- Optimise performance, reliability and efficiency across compute-intensive environments.
- Support large-scale Linux-based platforms and distributed systems.
- Work closely with engineering teams to deliver production-ready infrastructure solutions.
- Improve monitoring, observability, automation and operational excellence across critical systems.
- Contribute to platform architecture, engineering standards and best practices.
- Troubleshoot complex infrastructure, networking and performance issues.
You'll likely bring experience across several of the following areas:
- Strong background in infrastructure engineering, platform engineering, DevOps or Site Reliability Engineering.
- Deep Linux systems administration and troubleshooting experience.
- Experience with cloud platforms including AWS, Azure or GCP.
- Strong Infrastructure as Code capability using Terraform or similar tooling.
- Experience with Kubernetes and container platforms.
- Automation and scripting experience using Python, Bash or similar.
- Strong understanding of networking, storage and distributed systems.
- Experience operating highly available, mission-critical environments.
- Experience supporting GPU-based infrastructure.
- Exposure to AI/ML platforms, model training environments or inference workloads.
- Experience with HPC environments and workload schedulers.
- Knowledge of high-performance networking technologies such as InfiniBand or RDMA.
- Experience supporting large-scale compute environments.
- Opportunity to work on genuine AI infrastructure initiatives.
- Exposure to large-scale GPU and HPC environments.
- Complex technical challenges with significant engineering ownership.
- Collaborative, high-performing engineering culture.
- Strong career growth opportunities within a rapidly evolving technology domain.
- Competitive salary and benefits package.
If you're passionate about infrastructure, automation, distributed systems and enabling the future of AI, we'd love to hear from you.