Building the Future of Open Finance

Payward - the parent company behind Kraken, NinjaTrader, Breakout, xStocks, Payward Services and CF Benchmarks - has spent the last 15 years building one of the most modern and globally accessible financial infrastructure platforms in the industry, built to advance an open, global financial system.

Before you apply, we encourage you to explore our culture page to understand what drives us and how we work.

The team

Join our engineering team and play a pivotal role in upholding the reliability, scalability, and efficiency of our robust platform team. As a Site Reliability Engineer (SRE), you will collaborate closely with diverse cross-functional teams to conceive, execute, and oversee the foundational infrastructure systems that empower our array of applications and services.

As a key member of our SRE team, you will guarantee the availability, high performance, scalability and cost efficiency of our critical services and platforms. This is an excellent opportunity for engineers who are passionate about automation, cloud technologies, distributed systems, monitoring, logging and maintaining highly available financial platforms.

You will work closely with Software Engineers, Security Engineers, and Platform teams to improve operational excellence and support mission-critical financial services. You will participate in system monitoring, incident response, automation initiatives, and infrastructure improvements while learning best practices for operating large-scale, highly regulated environments.

The Opportunity

Implement data infrastructure solutions (self service) that support the needs of dozens of business units and hundreds of engineers
Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments.
Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure.
Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues.
Manage and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments
Utilize Kubernetes and Nomad to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration.
Implement effective incident response procedures and participate in on-call rotations.
Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions.
Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement.

What you Bring

Bachelor’s degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
Proven experience of 1+ year of working as a Site Reliability Engineer, Infrastructure/Platform/DevOps Engineer, Software Engineer or similar roles
Ability to leverage AI tools and agents such as Claude and OpenAI to efficiently deliver business value
Solid understanding of bash/shell scripting and proficiency in at least one programming language (preferably Python, Golang or Rust).
Experience with containerization tools such as Docker or Podman
Strong problem-solving skills and the ability to troubleshoot complex systems.

Nice to haves

Experience managing and operating data systems such as Kafka, Redis, ElasticSearch, MariaDB, AirFlow, Debezium, ScyllaDB, TiDB, Hashicorp Vault
Experience managing self-hosted and SaaS platforms such as Splunk, VictoriaMetrics, Grafana, Cloudflare, Ingresses, Gitlab
Experience running Kubernetes as a Platform offering for engineering teams
Kubernetes AWS or on-premises experience managing workloads at scale
Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis
Familiarity with CI/CD deployment pipelines and related tools.
Working experience in managing AWS infrastructure components

Unless a specific application deadline is stated in the job posting, applications are accepted on an ongoing basis.

Please note, applicants are permitted to redact or remove information on their resume that identifies age, date of birth, or dates of attendance at or graduation from an educational institution.

We consider qualified applicants with criminal histories for employment on our team, assessing candidates in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.

Payward is powered by people from around the world and we celebrate the diverse talents, backgrounds, contributions, and unique perspectives that everyone brings to the table. We hire based on merit, seeking out people with the right abilities, knowledge, and skills for the job. We encourage you to apply for roles where you don't fully meet the listed requirements, especially if you're passionate or knowledgeable about crypto.

We may ask candidates to complete job-related skills or work-style assessments as part of our hiring process. These assessments evaluate competencies relevant to the role and are applied consistently across candidates for similar positions. Results are considered alongside experience and interviews, and are not the sole basis for any employment decision.

As an equal opportunity employer, we don't tolerate discrimination or harassment of any kind, whether based on race, ethnicity, age, gender identity, citizenship, religion, sexual orientation, disability, pregnancy, veteran status, or any other protected characteristic as outlined by federal, state, or local laws.

Stay in the know

Learn on the Kraken Blog

Connect on LinkedIn

Candidate Privacy Notice

Kraken

Site Reliability Engineer - Core Infrastructure - Kraken

Building the Future of Open Finance

The team

The Opportunity

What you Bring

Nice to haves

Apply for this position

Apply through SailOnChain

Explore more jobs

More from Kraken

Similar roles you may like