inContact

Free to read  ·  Free to subscribe  ·  Free to join

Engineering

What no engineering blog tells you about on-call.

By Tomás Iglesias · site reliability engineer · 2 min read · 247 readers · readers today

Engineering blogs about on-call always undersell one specific thing: the cost of the calm weeks. Everyone writes about the dramatic incident, the post-mortem, the 3am page. Nobody writes about the eight uneventful weeks that came before, where you carried the pager and were a bit less present in your own life and didn't notice it accumulating.

The drama is easier to talk about because it has a shape. The slow tax doesn't. You sleep badly the night before your week starts. You don't drink the second glass of wine at dinner. You decline the trip your friend invited you on because the on-call window overlaps. None of these are individually noticeable. Together they're the actual cost.

Good on-call rotations are not the ones that respond fast to incidents. They're the ones that protect the silence between incidents. That means real handoffs, real backup, real permission to ignore the pager when you genuinely cannot answer it. Most rotations I've seen quietly fail this test.

If your team only measures on-call by incident response time, you're optimising for the visible part of the problem and ignoring the larger part. The engineers who burn out aren't the ones who couldn't handle the page. They're the ones who carried the calm weeks for too long without anyone noticing.

The conversation · 0 replies

SC

Get the best thinking on Engineering — weekly, by email.