Senior/Mid Site Reliability Engineer

Job Openings Senior/Mid Site Reliability Engineer

About the job Senior/Mid Site Reliability Engineer

Location-Type: Remote (Global)

Role Overview: As a Senior Site Reliability Engineer (SRE), you will be instrumental in ensuring the reliability, scalability, performance, and security of our critical systems and infrastructure. This pivotal role contributes to mission-critical projects aimed at driving operational excellence and continuous improvement in our digital service delivery, working across the full lifecycle of our production systems.

Key Responsibilities:

Design, implement, and maintain highly available, scalable, and resilient cloud-native infrastructure.
Develop and optimize automation tools and processes for deployment, monitoring, alerting, and incident response.
Implement and manage robust monitoring and logging solutions to gain deep insights into system health and performance.
Participate in on-call rotations to respond to and resolve critical incidents, performing root cause analysis to prevent recurrence.
Collaborate closely with software development teams to promote SRE best practices, improve system design, and enhance observability.
Drive initiatives for performance tuning, capacity planning, and cost optimization of cloud resources.
Ensure security best practices are integrated into infrastructure and application deployments.

Required Skills & Experience:

5+ years of professional experience in a Site Reliability Engineering, DevOps Engineering, or similar role focused on production systems.
Strong expertise in managing and operating services on major cloud platforms (e.g., AWS, Azure, or Google Cloud Platform).
Proven experience with containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes).
Proficiency in scripting and automation using languages like Python, Bash, or Go.
Hands-on experience with CI/CD pipelines (e.g., Jenkins, GitLab CI, Azure DevOps, GitHub Actions).
Extensive experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK Stack, Datadog, Splunk).
Solid understanding of networking concepts, distributed systems, and microservices architectures.
Experience with infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation, Ansible).
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent professional experience.

Or refer someone