We don’t think about job roles in a traditional way. We are anti-silo. Anti-career stagnation. Anti-conventional.
Beyond ONE is a digital services provider radically reshaping the personalised digital ecosystems of consumers in high growth markets around the world. We’re building a digital services aggregator platform, with a strong telco foundation, and a profitable growth strategy that empowers users to drive their own experience—subscribe once, source from many, and only pay for what you actually use.
Since being founded in 2021, we’ve acquired Virgin Mobile MEA, Friendi Mobile MEA and Virgin Mobile LATAM (with 6.5 million subscribers) and 1600 dedicated colleagues across Chile, Colombia, KSA, Kuwait, Mexico, Oman and UAE.
To disrupt for good takes a rebellious spirit, a questioning mind and a warm heart. We really care about how to get things done and not who manages who. We benefit from our diversity, and together, we disrupt the way we and others thinkin about our lives for good.
Do you want to exchange ideas, learn from each other and leave your mark on our journey? This is the place for you.
Role Purpose
Why this role matters: As a Site Reliability Engineer (SRE), you will play a key role in enhancing system reliability, scalability, and performance through automation, monitoring, and operational excellence. Your contributions will help shape our reliability engineering practices and platform stability, ultimately transforming how we deliver resilient and scalable services to users.
What success looks like: In your first year, you will:
- Build and maintain automated systems to improve service uptime and incident response.
- Implement and refine monitoring and alerting strategies to proactively detect issues.
- Drive operational efficiencies by reducing toil and introducing reliability-focused tooling.
Why this is for you: If you're keen on solving availability, latency, and performance issues at scale, hit us up. We're looking for someone ready to tackle this challenge head-on and make an impact from day one.
Key Responsibilities
In this role, you will:
- Lead the development of resilient, highly available systems and incident response strategies.
- Collaborate with software and infrastructure teams, driving reliability and observability initiatives.
- Manage production infrastructure and environments, ensuring optimal performance and uptime.
- Automate operational tasks using infrastructure-as-code and scripting tools.
- Design and maintain monitoring and alerting systems using Prometheus, Grafana, or similar.
- Conduct blameless postmortems and implement learnings to prevent future incidents.
- Implement SLOs, SLIs, and error budgets to guide engineering decisions.
- Optimize CI/CD pipelines and deployment processes for reliability and speed.
- Engage with stakeholders to align reliability goals with business outcomes.
Qualifications & Attributes
We’re seeking someone who embodies the following:
Education: Bachelor’s degree in Computer Science, Engineering, or a related field.
Experience: 3+ years in Site Reliability Engineering, DevOps, or similar operational roles.
Technical Skills:
Must-haves:
- Strong background in Linux/Unix systems and network administration.
- Experience with cloud platforms (AWS, Azure, or GCP).
- Experience implementing SLOs, SLIs, and error budget policies.
- Proficiency in infrastructure automation (Terraform, Ansible) and scripting (Python, Go, or Bash).
- Deep understanding of monitoring, observability, and incident management tools (Prometheus, Grafana, Splunk, etc.).
- Solid grasp of CI/CD practices, containerization (Docker), and orchestration (Kubernetes).
Nice-to-haves:
- Familiarity with distributed systems, service meshes, and performance tuning.
Unique Attributes:
- Thrives in fast-paced environments requiring quick decision-making.
- Possesses a proactive mindset and a calm, analytical approach to troubleshooting under pressure.
- Excels with SRE best practices, modern ops philosophies, and large-scale system thinking.
What we offer:
- Rapid learning opportunities - we enable learning through flexible career paths, exposure to challenging & meaningful work that will help build and strengthen your expertise.
- Hybrid work environment - flexibility to work from home 2 days a week.
- Healthcare and other local benefits offered in market.
By submitting your application, you acknowledge and consent to the use of Greenhouse & BrightHire during the recruitment process. This may include the storage and processing of your data on servers located outside your country of residence. For further information, please contact us at dataprivacy@beyond.one.