Template Not Found

Notification Banner Text

Notification button Text

Job

Job Details

Back to results

Site Reliability Engineer

29647 Posted: 23/05/2025

Negotiable
Singapore, Asia Pacific
permanent

About Our Client:

A pioneering AI innovation leader

Key Responsibilities:

Manage production-grade container ecosystems (Kubernetes/Docker) and open-source component clusters across multiple business units

Develop infrastructure operation platforms encompassing CI/CD pipelines, monitoring/alerting systems, and centralized logging solutions

Execute rapid incident response protocols to maintain service continuity and minimize downtime

Optimize system architecture and deployment strategies to ensure 99.9%+ service availability

Spearhead automation programs to streamline operations and eliminate manual processes

Partner with engineering teams to implement infrastructure-as-code (IaC) principles and reliability patterns

Maintain 24/7 operational readiness through rotational on-call support

Qualifications:

5+ years in SRE/DevOps roles managing large-scale distributed systems

Expert-level proficiency with AWS/Azure/GCP cloud ecosystems

Advanced Linux administration skills with hands-on maintenance experience

Scripting mastery in Python/Shell for operational automation

Deep technical expertise in optimizing Nginx, JVM, Redis, Kafka, and SQL/NoSQL datastores

Production experience managing Kubernetes clusters and containerized workloads

CI/CD implementation experience using GitLab CI/ArgoCD or comparable tools

Proven ability to diagnose complex system failures under time constraints

Effective remote collaboration skills across technical teams

Self-driven work ethic with strong technical ownership mentality

Full professional fluency in English and Chinese

Rachel Mou Divisional Director

名

姓

联系电话

邮箱地址

国家

城市

国籍

目前担任职位

上传简历

Choose file

消息

提交此表格即表示您同意我们的隐私政策

Copyright First Point Group 2022

Site by Venn