Job Openings Senior/Mid Site Reliability Engineer

About the job Senior/Mid Site Reliability Engineer

Location-Type: Remote (Global)

Role Overview: As a Senior Site Reliability Engineer (SRE), you will be instrumental in ensuring the reliability, scalability, performance, and security of our critical systems and infrastructure. This pivotal role contributes to mission-critical projects aimed at driving operational excellence and continuous improvement in our digital service delivery, working across the full lifecycle of our production systems.

Key Responsibilities:

  • Design, implement, and maintain highly available, scalable, and resilient cloud-native infrastructure.
  • Develop and optimize automation tools and processes for deployment, monitoring, alerting, and incident response.
  • Implement and manage robust monitoring and logging solutions to gain deep insights into system health and performance.
  • Participate in on-call rotations to respond to and resolve critical incidents, performing root cause analysis to prevent recurrence.
  • Collaborate closely with software development teams to promote SRE best practices, improve system design, and enhance observability.
  • Drive initiatives for performance tuning, capacity planning, and cost optimization of cloud resources.
  • Ensure security best practices are integrated into infrastructure and application deployments.

Required Skills & Experience:

  • 5+ years of professional experience in a Site Reliability Engineering, DevOps Engineering, or similar role focused on production systems.
  • Strong expertise in managing and operating services on major cloud platforms (e.g., AWS, Azure, or Google Cloud Platform).
  • Proven experience with containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes).
  • Proficiency in scripting and automation using languages like Python, Bash, or Go.
  • Hands-on experience with CI/CD pipelines (e.g., Jenkins, GitLab CI, Azure DevOps, GitHub Actions).
  • Extensive experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK Stack, Datadog, Splunk).
  • Solid understanding of networking concepts, distributed systems, and microservices architectures.
  • Experience with infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation, Ansible).
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent professional experience.