Senior Staff Engineer in Public Cloud Operations

·

We are seeking a highly skilled Senior Staff Engineer to join our team, focusing on the management and optimization of our hybrid cloud infrastructure. This role requires deep expertise in managing resources across leading public cloud platforms, ensuring system performance, security, and compliance at an enterprise level. The ideal candidate will be instrumental in orchestrating unified operations, driving cost-efficiency, and maintaining high availability across our global cloud environments.

Core Responsibilities

Cloud Platform Architecture and Operations

Design, deploy, and maintain robust cloud infrastructures utilizing key services from AWS and Alibaba Cloud. This includes compute instances (EC2, ECS), object storage (S3, OSS), networking (VPC, CEN), serverless functions (Lambda, Function Compute), and managed Kubernetes services (EKS, ACK). A critical part of this role is architecting highly available systems that can scale automatically based on demand while optimizing network topologies and resource configurations for peak performance.

Monitoring and Incident Management

Implement comprehensive, full-stack monitoring solutions using a combination of cloud-native tools like AWS CloudWatch and Alibaba Cloud CloudMonitor, alongside open-source stacks such as Prometheus with Grafana and the ELK stack. You will lead the response to critical incidents, perform thorough root cause analyses, and establish preventive measures to address issues like resource contention, configuration errors, and network latency.

Cost Optimization and Resource Management

Analyze cloud spending and resource utilization patterns to identify and implement significant cost-saving opportunities. Strategies include purchasing reserved instances, configuring auto-scaling policies, and implementing intelligent storage lifecycle management. You will also establish and enforce resource quota frameworks to prevent waste and control expenditures across departments.

Security and Compliance

Implement and enforce stringent cloud security baselines. This involves managing security groups, identity and access management policies (AWS IAM, Alibaba Cloud RAM), and utilizing security services like AWS Security Hub. You will conduct regular security audits, remediate vulnerabilities, and design granular access controls. Ensuring comprehensive auditing via tools like AWS CloudTrail and Alibaba Cloud DAS is also a key requirement.

Collaboration and Knowledge Sharing

Work closely with development teams to optimize application architectures for the cloud, advocating for modern approaches like microservices and serverless computing. A significant part of your role will be to document standard operating procedures and lead internal technical training sessions to elevate the entire team's cloud capabilities.

Required Qualifications and Skills

Technical Expertise

Professional Experience

Soft Skills and Education

Additional Valued Skills

What We Offer

We provide a competitive total compensation package designed to attract top talent. Our benefits include extensive Learning & Development programs with education subsidies to support your continuous growth. We foster a strong community through various team-building programs and company events. Employees also enjoy wellness and meal allowances, alongside comprehensive healthcare schemes that extend to dependants.

We are committed to creating a rewarding and diverse environment, united by a culture that emphasizes teamwork, integrity, and results. For a deeper look at how you can contribute to our forward-thinking team, we encourage you to 👉 explore this career opportunity further.

Frequently Asked Questions

What is the primary focus of a Senior Staff Engineer in Public Cloud Operations?
This role focuses on the end-to-end lifecycle management of a large-scale hybrid cloud infrastructure. The engineer is responsible for ensuring high availability, optimizing performance and costs, and maintaining strict security and compliance standards across AWS and Alibaba Cloud environments.

What are the key technical skills required for this position?
Key skills include mastery of core AWS or Alibaba Cloud services, proficiency in automation with Shell/Python/Ansible, and hands-on experience with container orchestration using Kubernetes and its managed services (EKS, ACK). Experience designing highly available and auto-scaling architectures is crucial.

How does this role contribute to cost management?
The engineer analyzes cloud resource usage to identify savings opportunities through strategies like reserved instances, auto-scaling, and intelligent storage tiering. They also establish resource quota management systems to prevent overspending and ensure efficient use of cloud budgets.

What is the importance of security in this cloud operations role?
Security is paramount. The engineer implements security baselines, manages IAM/RAM permissions, conducts regular audits, and remediates vulnerabilities. They design granular access controls and ensure comprehensive auditing is in place to protect enterprise assets and data.

Does this role require collaboration with other teams?
Yes, extensive collaboration is essential. The engineer works closely with development teams to optimize application architectures for the cloud and provides internal training and documentation to share knowledge and standardize operational best practices across the organization.

What kind of experience is preferred for candidates?
Beyond the 5+ years in operations, preference is given to candidates with experience building cloud platforms from scratch, designing hybrid cloud solutions, or leading large-scale migration projects. Familiarity with multi-cloud management and FinOps is also highly valued.