about the company
Cloud MNC industry
We are looking for min of 3 years experience experience in Cloud Computing (IaaS/PaaS/SaaS), DevOps, or Enterprise Architecture with proven track record supporting Fortune 500 or large-scale enterprise customers.
...
about the job
- Complex Incident Management: Beyond daily consulting, act as the final escalation point for L1 issues. Lead the troubleshooting of high-priority (P0/P1) incidents involving complex hybrid cloud architectures.
- Product & Engineering Synergy: Provide deep-dive technical insights to R&D teams. Influence the product roadmap by identifying systemic architectural flaws and proposing optimization solutions.
- Customer Success & Risk Mitigation: Conduct proactive technical audits and architectural reviews for Key Accounts (KA). Use diagnostic tools not just to "avoid risks" but to design high-availability (HA) and disaster recovery (DR) strategies.
- Knowledge Empowerment: Create and maintain high-quality technical documentation, troubleshooting playbooks, and internal Knowledge Base (KB) articles to improve the overall team’s technical capability.
- Project Leadership: Demonstrated ability to lead complex cloud migration projects or large-scale system troubleshooting under high pressure.
Requirements;
- Deep understanding of Linux/Windows kernel tuning and performance optimization.
- Advanced Networking: Expert knowledge in VPC, BGP, VPN, Express Connect (Direct Connect), and SD-WAN. Ability to analyze packet loss/latency using tools like Wireshark/Tcpdump at a professional level.
- Database & Big Data: Not just "familiar," but capable of performance tuning and migration for at least two engines (e.g., MySQL AND Redis/MongoDB).
- Cloud-Native & Modern Tech:
- Proficiency in Containerization (Docker/Kubernetes) and Microservices.
- Hands-on experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Automation & Scripting: Strong ability to automate repetitive tasks using Python, Go, or Shell to improve support efficiency (SRE mindset).
- Crisis Communication: Ability to remain calm and communicate effectively with customer CTOs/IT Directors during major outages.
- Analytical Thinking: Strong logical reasoning to perform complex Root Cause Analysis (RCA).
about the manager/team
You will be working with global team