Engineering Team Lead, SRE - Real-time Data

Overview
Engineering Team Lead, SRE - Real-time DataLocation: LondonBusiness Area: Engineering and CTORef #: 10044820Bloomberg’s Real-time Data group is responsible for distributing low-latency, high-volume financial data to users around the world. From equity prices to FX rates, our infrastructure handles over 60 billion messages per day from 370+ global exchanges, powering 375,000 Terminals and 3,000+ BPIPE clients across on-prem and cloud environments. The London Real-time Data SRE team plays a critical role in making this possible—developing the core services and tooling that ensure our systems are reliable, scalable, and observable.The OpportunityWe’re looking for an experienced engineering manager to lead a team of software and SRE engineers. This is a hands-on leadership role where you’ll be accountable for both technical execution and people development. You’ll shape the team’s roadmap, drive production readiness, and grow a high-performing, collaborative engineering culture.What You’ll Own
You’ll lead a team that supports several key components of the Real-time Data platform:Configuration Delivery Services:
Enables thousands of servers and BPIPE endpoints to “call home” and receive correct settings.Peer Discovery Infrastructure:
Groups servers into discoverable clusters and provides tools to manage them.Observability and Monitoring Frameworks:
Ensures we have high visibility across a vast estate of global infrastructure.Data Quality Tooling:
UI and backend systems for diagnosing distribution issues across the real-time data network.Cross-team Reliability Work:
You’ll help improve the reliability of systems beyond the team’s formal ownership.Leadership and Responsibilities
As the team’s leader, you will manage the career growth, performance, and mentorship of software engineersDrive hiring, onboarding, and long-term team cultureStay hands-on: participate in technical design, and lead incident response when necessaryYou’ll balance operational excellence with software development, helping your team deliver tools, services, and processes that scale with the business.How the Team Operates
The team’s mission aligns with five SRE pillars:Latency Monitoring and Management
– Define SLIs/SLOs, track latency, and build tools to diagnose issues.Capacity Management
– Maintain disaster readiness and scalability through monitoring and forecasting.System Observability
– Proactively detect issues, build alerting systems, and centralize health dashboards.Production Risk Management
– Ensure safe software releases, drive infrastructure improvements.Incident Response
– Lead or support fast, effective remediation during live incidents; build automation for common operational issues.What We’re Looking For
We’re seeking a leader who can combine strong technical execution with people-first leadership. You’ll guide the team’s roadmap, help individuals grow, and contribute to the broader reliability strategy across Real-time Data.You’ll need to have:Proven experience directly managing software engineers in a production environmentStrong hands-on development skills in an object-oriented language—Python or C++ preferredA background in building reliable, well-tested software for production systemsConfidence diagnosing and resolving live operational issuesStrong communication skills—able to work across teams and influence peersA track record of helping teams plan, prioritize, and deliver complex technical projectsThe ability to define a long-term vision for the team’s technology and cultureBonus
Background in SRE, infrastructure, or high-throughput distributed systemsFamiliarity with observability tooling, configuration management, or peer discovery patterns
#J-18808-Ljbffr
Other jobs of interest...
Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!