You Build It You Run It sounds great… but it won’t work here!


Since You Build It You Run It was outlined in 2006, on-call product teams as an operating model has gone from being a controversial idea… to being a controversial idea. Enterprise organisations don’t do it, but they do talk about why they don’t do it.


your organisation needs to thrive and survive

you need to achieve a new baseline for your digital services

=>

  • Higher throughput: weekly or more frequent
  • greater reliability: 99,0+% availability, and time to repair in minutes
  • learning culture

but you have a hard problem in there because you have one central operations team that needs to keep your services up and running

what you need is => you build it, you run it, an operating model in which product teams build, deploy, operate and support their own digital services

-> playbook: Equal Experts - You Build It, You Run It playbook

How do you know you need You Build It, You Run It? and when do you not need it?

=> decision table: financial exposure on failure vs product feature demand (see playbook)

How does You Build It, You Run It look like for digital services?

  • product teams: who do delivery and monitoring + level 1 on-call

How does Ops Run It look like for foundational systems?

  • product team: build
  • operations: deploy, monitoring
  • operational enablers: DBA’s, …

It sounds great … but it Won’t work here

Developers won’t want to do it

We haven’t given our developers what they need

  • have you explained the mission?
  • have you asked what’s wrong, and listened? -> listen to their concerns until you have a full understanding
  • have you committed to putting things right?

=> give your developers what they need

You need to pay your developers for being on-call due to the implicit social sacrifice. If you don’t understand this, someone else will, and they’ll move to that company”. @SteveSmith_Tech #AOTB2022

– Abraham Marin-Perez (@AbrahamMarin), Jul 7, 2022

It won’t scale to lots of teams

We haven’t balanced financial exposure and on-call costs

You Build It, You Run It is a multi-cost insurance for customer outcomes!! -> recognise opportunity costs and revenue protection matter

  • have you forecast teams, services and financial exposure?
  • have you understood “everything always on” is a myth?

    you have a few services that are critical, they have a lot of money running through them

    you have a lot of services that are important, but not critical

    you have a few services that are not important at all

Nobody won’t be accountable

“We haven’t tried trusting our people to do the right thing”

  • have you explained to your leadership and teams why shifting accountabilities onto teams will help deliver customer outcomes
  • have you linked operational objectives to outcomes
  • have you encouraged delegating responsibilities

trust your people to do the right thing => RACI model (responsible, accountable, consulted, and informed)

this is really hard to do, to shift responsibility

There’d be no incident management

We haven’t made incident management self-service

  • have you brought incident managers into the fold => ensure your incident managers and developers appreciate how their orthogonal workloads tie into the mission
  • have you mapped out a self-service workflow => automate away all the manual, repetitive actions and handoffs
  • have you run some chaos days

How to make incident management self-service to make sure you can operate “You Build It, You Run It” by @SteveSmith_Tech #AOTB2022

Make incident management self service

– Abraham Marin-Perez (@AbrahamMarin), Jul 7, 2022

automated alert -> incident response platform (PagerDuty) -> incident updates -> Ticketing system (ServiceNow) -> incident managers -> incident channel -> collaboration platform (Slack) -> interested parties -> contact on-call -> phone/app -> on-call developer + incident managers

Developers would be firefighters

developers would spend all their time on solving problems instead of spending all their time on building features because that is what generates money

“We don’t measure or eliminate BAU work”

BAU is actually a synonym for unplanned work

  • have you learned to manage unplanned work -> track BAU work items in your ticketing system -> prioritise BAU vs value work
  • have you visualised per-team rework rate -> calculate the weekly % of unplanned work on each team add a label “BAU” to your tickets => you can search every week for that label
  • have you built paved roads -> boost developer experience by automating their user journeys I do not mean a platform!!

=> measure and eliminate BAU work

We can’t hire a DBA for every team

we never asked for that!

“We haven’t made repeatable, specialist tasks self-service”

  • have you understood this isn’t You Build It You Run It this is an unnecessary extreme!!
  • have you rejected embedding specialists in teams -> DBAs will be stretched across teams, overworked/bored and unhappy
  • have you automated repeatable specialist tasks offload low value tasks to the cloud provider automate high value tasks as self-service

-> solve specialists at scale as a problem

Empowering a development team doesn’t necessarily mean get rid of experts like DBAs, it just means that those experts need to be engaged in the creation of self-service platforms for the developers to use @SteveSmith_Tech #AOTB2022

– Abraham Marin-Perez (@AbrahamMarin), Jul 7, 2022

We haven’t this before

“We don’t know how to get started”

-> share the mission -> pick a pilot team: do not try this on all teams at once, try some small experiments -> change the mindset

Takeaways

You Build It You Run It is great and it can work here!