The impact you will have:
As Staff MLOps Engineer, you will define and build Elliptic's Enterprise MLOps platform. Elliptic has growing ML capability across several teams, an established model registry, and a maturing model risk management practice. What is missing is the unified platform layer that ties training, deployment, monitoring, and governance together into a coherent, scalable discipline. You will be responsible for creating that layer.
Your platform will serve four distinct internal consumers, each with different needs:
- Product Engineering teams building customer-facing models and customer data analytical models, who need reproducible training pipelines, CI/CD for model deployment, and low-latency serving infrastructure
- Intelligence Research building frontier intelligence collection, predictive pre-screening models, and behavioural pattern detection, who need rapid experimentation, GPU orchestration, and dataset versioning
- InfoSec who own the model registry and model risk management framework today, and need the platform to close execution gaps in audit trails, drift monitoring, and compliance reporting
- Operations who own BI, usage prediction, and revenue opportunity signalling, and need scheduled batch inference, BI integration, and pipeline reliability
The platform you build must enforce governance with enough rigour to satisfy a regulated financial crime context, while remaining flexible enough to avoid slowing down research teams who need to iterate quickly.
This is a role for someone who has built ML infrastructure from the ground up before, who understands that a platform succeeds only when it is adopted, and who is comfortable making build-vs-buy decisions that others will adopt and use for years.
What you will do:
- Define the target-state MLOps architecture for Elliptic, covering model training pipelines, serving infrastructure, monitoring, feature management, and governance, and produce the architecture decision records that inform investment decisions
- Make and document build-vs-buy-vs-stop recommendations with clear cost modelling and trade-off analysis, evaluating vendors, open-source tools, and managed services against Elliptic's constraints (AWS-primary, Databricks ecosystem)
- Work with InfoSec to improve the existing model registry and model risk management framework, closing identified gaps in metadata, lineage, approval workflows, and drift/bias detection
- Build model training pipelines, CI/CD for ML, and serving infrastructure, working directly with a small group of infrastructure engineers to ship production-grade platform capabilities
- Instrument observability across the ML lifecycle: training metrics, serving latency and throughput, data quality, and prediction drift, integrating with Elliptic's existing observability stack
- Work directly with data scientists and ML engineers across all four consumer groups to onboard them onto the platform, writing documentation, runbooks, and reference architectures that lower the barrier to self-service
You will be a great fit here if you:
- Have built MLOps platforms or ML infrastructure from the ground up, and can speak to what worked, what didn't, and why
- Have operated in a regulated industry (e.g. compliance, financial) and have hands on experience building ML infrastructure to meet those regulatory demands
- Think about ML infrastructure the way the best platform engineers think about data infrastructure: as a set of foundations with internal customers whose needs must be understood and balanced
- Are comfortable operating in ambiguity, making decisions with incomplete information, and creating structure where none exists, while remaining open to changing course when better information arrives
- Influence through clarity, evidence, and the quality of your work rather than positional authority. You earn adoption by making the platform genuinely better than the alternative
- Care about production engineering quality: you write production-grade code, your systems are tested, observable, documented, and designed for others to operate
Our ideal candidate has:
- Deep hands-on experience building MLOps platforms, including model registries, feature stores, and ML pipeline orchestration
- Working knowledge of model serving patterns: real-time inference, batch prediction, A/B deployment, and deployment strategies
- AWS infrastructure experience (ECS/EKS, S3, IAM, networking) and comfort operating in a Databricks ecosystem or equivalent lakehouse architecture
- Experience with model monitoring: model evaluation, data drift detection, prediction drift, and performance degradation alerting
- A track record of building something from zero and bringing it to a state where others could operate and extend it
- Experience in a regulated industry (fintech, financial services, healthcare) where model governance is a compliance requirement
- See AI as a core part of how modern engineering gets done, not a passing trend. You actively use it to think faster, prototype faster, and pressure-test your own designs, and you're excited that the bar keeps rising.
- Prior experience running formal build-vs-buy evaluations with written decision records
Bonus Points for:
- Familiarity with model risk management frameworks and the ability to connect governance practices to regulatory expectations
- Experience working simultaneously with research-oriented ML teams and production-oriented engineering teams, and understanding how their needs diverge
- Infrastructure-as-code fluency (Terraform)
- Experience with ClickHouse or similar OLAP engines for low-latency ML feature serving
- Blockchain or crypto domain knowledge
- Experience working in fraud detection and modelling
- Contributions to open-source MLOps tooling
Job Benefits
How we work:
- Hybrid working and the option to work from almost anywhere for up to 90 days per year
- £500 Remote working budget to set up your home office space
Learning & Development:
- $1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and development
Vacation/ Leave:
- Holidays: 25 days of annual leave + bank holidays
- An extra day for your birthday
- Enhanced parental leave: we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave and leave.
Benefits:
- Private Health Insurance - we use Vitality!
- Full access to Spill Mental Health Support
- Life Assurance: we hope you will never need this - but our cover is for 4 times your salary to your beneficiaries
- £100 Crypto for you!
- Cycle to Work Scheme

