⊙ blog/index.html
Notes from on-call.
Field notes on monitoring, incidents, patching, and the slow craft of running production.
Runbook-as-code, two years in
We started writing our runbooks as version-controlled markdown two years ago. The pattern stuck, we made some mistakes, and we'd like to spare you a few of them. Notes from the trenches.
The five alert classes that earn their pages
If your alert volume is up and the resulting incident count isn't, you've got a noise problem. We sort alerts into five classes and only one of them is allowed to wake humans up.
Patching as a SLO, not a queue
Most teams treat patching as work that piles up. Some treat it as an SLO. The difference shows up in CVE response time and in how much CISO conversation it takes to ship anything.