Site Reliability Engineer

Overview
We are looking for an experienced Site Reliability Engineer to join the Igloo team in Cambridge to champion observability and delivery. The candidate should have strong communication skills, experience in coaching or sharing knowledge, and proficiency in Azure and Observability platforms.Join Insurance Consulting and Technology (ICT) during a transformative period aimed at enhancing customer and business value. You\''ll be part of a high-performing team renowned for quality delivery, rapid development, and team spirit. We have won the InsuranceERM Best use of Cloud Technology award three years in a row.Igloo is embarking on new and exciting uses of their technology. This role will have the opportunity to help the team and product deal with exciting, complex and large-scale client propositions where observability will be essential and help transform how the product is designed and deployed.You will join a cross-team guild of Site Reliability Engineers, which enables you to not only influence direction within your product family, but to also help shape how we handle observability and monitoring across ICT.This role is open to flexible and hybrid working arrangements, with presence in the Cambridge office a minimum of two days per week.The Role
Collaborate with cross-functional teams to ensure the reliability, availability, and performance of our client-facing servicesMaintain and configure observability platforms such as DatadogProactive monitoring of production and other environments to ensure stability, availability, security and integrityDesign and implement automation and processes to improve the efficiency and effectiveness of the teams and other support functionsEngage with business stakeholders to gather requirements, address concerns, and provide updates on projects and system statusContribute to the design, build and operational management of the servicesLead incident response, troubleshooting, and root cause analysis to mitigate and prevent future issuesWork closely with engineering, support and operations teams to upskill and promote knowledge transfer, producing training materials and articlesParticipate in on-call rotation to provide support and ensure system uptimeThe Requirements
Experience as a Site Reliability Engineer or in a similar role (such as DevOps)Familiarity with managing cloud-based services (ideally Azure), including observability, monitoring, scaling, and securityHands-on use of observability tools such as Datadog (or similar)Knowledge of automation, scripting (Python or PowerShell), and Infrastructure as Code (e.g., Terraform, Pulumi, ARM Templates, or Bicep)Experience with Azure DevOps Pipelines (or similar)Strong interpersonal, verbal, and written communication skillsAbility to coach, mentor, and share knowledge with othersExperience collaborating with external clients and cross-functional teamsCustomer-focused, with strong problem-solving skillsOther highly desirable, but not essential skills
Azure certifications, such as Azure Administrator, Azure Developer, or Azure DevOps EngineerExperience with containerization and orchestration (Docker, Kubernetes)Familiarity with programming languages such as C#Knowledge of Configuration as Code tools (e.g., Puppet, Ansible)At WTW, we believe difference makes us stronger. We want our workforce to reflect the different and varied markets we operate in and to build a culture of inclusivity that makes colleagues feel welcome, valued and empowered to bring their whole selves to work every day. We are an equal opportunity employer committed to fostering an inclusive work environment throughout our organisation. We embrace all types of diversity.(ICT_TECH TD_2025_47R)
#J-18808-Ljbffr
Other jobs of interest...

Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!