Backend / API Engineer, Reliability Infrastructure


To apply for this job please visit

Build a reliability platform powering economic growth 

Stripe powers businesses all over the world. We process payments, run marketplaces, detect fraud, help entrepreneurs start an internet business from anywhere in the world, build world-class developer-friendly APIs, and more. If you’re a software engineer here, you’ll get to build the systems that power our products and enrich our customers’ experiences.

Stripe doesn’t process quite as many requests as Twitter or Facebook, but we do care a very great deal about reliability. Every request we process is very important to everyone involved and we handle large volumes of money transactions worldwide! We can’t go down because our users’ businesses depend on us.

You’ll be on a team that builds products and tools for other teams at Stripe, like reliability metric and incident communication tools.  You’ll make decisions with a significant impact on Stripe. There is a lot of work to do to make Stripe engineers’ work easier and our platform even more reliable than it is today, and we’d love for you to be part of it.  This team plays an important role in increasing users’ confidence in Stripe.   We’re close to the people using our systems, so we constantly get feedback that we can use to make them better.  Reliability is a fast growing organization, where you’ll work with all engineering teams at Stripe to help them build confidence that their offerings are and will continue to operate reliably.  

We’re looking for people with a strong background (or interest!) in developing new products and in driving new processes.  We’d love to hear from you whether you’re a seasoned software developer, or whether you’ve just learned you might like working with databases or software applications.  Many of our software engineers work remotely, and we’d be happy to talk to you about the possibility of working remote.

You will:

  • Design, build, and maintain the core reliability products used by all of Stripes
  • Design, build, and maintain reliability-related metrics data and reporting platform used by both internally and–when appropriate–externally
  • Own tools and processes for Stripe’s incident program, including facilitating remediation, tracking resolution, streamlining communications and reporting on an incident’s final impact
  • Provide guidance and recommendations for other Stripes on how to improve their team’s reliability
  • Work closely with program managers and business partners to turn feedbacks into customer centric features

We’re looking for someone who:

  • Thinks about systems — their edge cases, failure modes, and lifecycles
  • Knows their way around a Unix shell
  • Can debug complex problems across the whole stack
  • Focuses on the needs of our users, both internal and external
  • Holds themself and others to a high bar when working with production
  • Is able to write high quality code in a programming language (e.g. Ruby, Scala, Go)
Job Overview