Template Not Found

Job

Job Details

Site Reliability Engineer

29647
  • Negotiable
  • Singapore, Asia Pacific
  • permanent

About Our Client:




  • A pioneering AI innovation leader



Key Responsibilities:




  1. Manage production-grade container ecosystems (Kubernetes/Docker) and open-source component clusters across multiple business units

  2. Develop infrastructure operation platforms encompassing CI/CD pipelines, monitoring/alerting systems, and centralized logging solutions

  3. Execute rapid incident response protocols to maintain service continuity and minimize downtime

  4. Optimize system architecture and deployment strategies to ensure 99.9%+ service availability

  5. Spearhead automation programs to streamline operations and eliminate manual processes

  6. Partner with engineering teams to implement infrastructure-as-code (IaC) principles and reliability patterns

  7. Maintain 24/7 operational readiness through rotational on-call support



Qualifications:




  • 5+ years in SRE/DevOps roles managing large-scale distributed systems

  • Expert-level proficiency with AWS/Azure/GCP cloud ecosystems

  • Advanced Linux administration skills with hands-on maintenance experience

  • Scripting mastery in Python/Shell for operational automation

  • Deep technical expertise in optimizing Nginx, JVM, Redis, Kafka, and SQL/NoSQL datastores

  • Production experience managing Kubernetes clusters and containerized workloads

  • CI/CD implementation experience using GitLab CI/ArgoCD or comparable tools

  • Proven ability to diagnose complex system failures under time constraints

  • Effective remote collaboration skills across technical teams

  • Self-driven work ethic with strong technical ownership mentality

  • Full professional fluency in English and Chinese



Rachel Mou Divisional Director
Copyright First Point Group 2022
Site by Venn