Sector: A major Telecommunications (Telco) provider with regional operations.
Infrastructure: Operates high-scale, multi-cloud (AWS, Azure, and/or GCP) and hybrid network-integrated environments.
Service Standards: Delivers mission-critical services requiring "telco-grade" (99.99%+) availability and performance.
Operational Culture: Highly structured environment following strict ITSM/ITIL processes, change controls, and regulatory compliance.
Team Environment: Focuses on 24x7 operational excellence, proactive risk management, and continuous improvement through automation.
Education & Experience: Bachelor’s degree in IT/CS/Engineering with 3–6 years in cloud operations or infrastructure roles.
Cloud Proficiency: Hands-on experience managing production environments in AWS, Azure, or GCP.
Infrastructure as Code (IaC): Expert usage of Terraform, Bicep, or CloudFormation to standardize and deploy resources.
Technical Troubleshooting: Ability to resolve complex platform issues using logs, metrics, and alerts in multi-cloud setups.
Domain Knowledge: Solid understanding of Cloud Networking, Security, IAM, and on-call service management (ITSM).
Operational Reliability: Responsible for maintaining production platforms to meet strict telco performance targets and 24x7 readiness.
Incident Leadership: Lead on-call rotations and act as the primary technical lead for high-severity or complex cloud incidents.
Service Restoration: Drive rapid recovery within agreed SLAs/MTTR while coordinating across network, security, and app teams.
Lifecycle Management: Perform Root Cause Analysis (RCA) and manage the full cycle of Incident, Problem, and Change Management.
Efficiency & Mentorship: Enhance resilience via automation and self-healing tools while providing technical guidance to junior engineers.