about the job
- Lead a newly-formed Site Reliability Engineering function, playing a key role in establishing core systems and operational practices from the ground up.
- Guide a team of SREs to maintain high service reliability, develop scalable infrastructure, and optimize system performance across multiple regions.
- Own the design and improvement of automation workflows, deployment pipelines, and incident response strategies to support seamless product delivery.
- Collaborate closely with cross-functional teams to align infrastructure goals with broader engineering objectives and business needs.
skills & experience
- Extensive hands-on experience with cloud platforms, containerized environments, and infrastructure-as-code tools to manage large-scale systems.
- Proven leadership in building or managing SRE or DevOps teams with strong analytical, decision-making, and communication skills.
- Deep knowledge of monitoring, logging, and alerting tools, along with practical understanding of SLOs, SLAs, and reliability metrics.
- At least 8 years in relevant technical roles, with a solid foundation in scripting/programming and a proactive mindset for problem-solving.
If this is the role for you, click "Apply Now" to have a confidential conversation about your career.
Only shortlisted candidates will be contacted.