The Rule
Do not check whether things are fine.
Pain Point
Manually checking systems to confirm they are “okay” creates unnecessary cognitive load.
Dashboards and queues become reassurance rituals: they consume time and attention without changing outcomes.
If something is broken, you should be told.
Do
For anything that might require action, ensure there is an automated signal.
- The signal should reach the people who can meaningfully act on it.
- It does not need to prescribe the action.
- Over time, actions may be formalised (runbooks, automation), but that is secondary.
If something requires attention, it should create noise. If it does not, it should remain silent.
Do Not
- Create mechanisms to confirm system health.
- Regularly inspect dashboards “just to be sure”.
- Rely on manual checks for reassurance.
Silence is expected when coverage is adequate.
Scope
This applies to operational, actionable failures: things that are broken now and require attention.
This does not cover:
- slow degradation
- trend monitoring
- preemptive or exploratory analysis
Analogy
Think of a perfect assistant: they interrupt you only when there is something you can act on. If they do not interrupt you, you can assume everything is fine.