Fishbone (Ishikawa) Diagram
Five Whys gives you a chain. Fishbone gives you a map. Software bugs almost always need the map.
Kaoru Ishikawa's cause-categorisation diagram from 1960s Japanese manufacturing. Problem at the head, categories as bones, sub-causes branching off each bone. It's a structured cause-analysis technique, not a retrospective format — use it inside a post-incident retro when the cause is likely multi-factorial.
When to use
Inside a post-incident retrospective, on a specific incident with likely multiple contributing causes — most software incidents. Five Whys assumes one cause and gives you blame; Fishbone assumes many and gives you a map. Skip it as a whole-retrospective format; it isn't one. Skip it for incidents with a clear single chain of causation — Five Whys is faster and the diagram adds nothing.
How it runs
State the problem at the head
One sentence, specific, agreed by the room. 'Production was down for 47 minutes on Tuesday afternoon.' Not 'reliability is bad.' The whole diagram hangs off this sentence; vague problem, vague diagram.
Pick six categories — for software, not factories
Ishikawa's original 6 Ms (Manpower, Method, Material, Machine, Measurement, Mother Nature) are factory-floor relics. For software, use Code, Tooling, Process, People, Externals, Comms. Localise the categories or the team will spend the meeting arguing about which category 'flaky test' falls into.
Branch causes off each bone
Per category, what plausibly contributed. Don't filter — write everything that might have helped cause the incident, even weakly. Filtering is the next step; this step is breadth.
Sub-branch where causes have causes
Each cause can spawn its own sub-causes. Ishikawa's original technique recursively whys each branch — a hybrid with Five Whys, scoped to one bone at a time.
Pick the high-leverage causes
Look at the diagram. The ones that show up in multiple categories, the ones with the densest sub-branches — those are the high-leverage targets. Pick two; commit to action on each.
Why it works
Software incidents almost always have multiple weakly-coupled contributing causes — a flaky test, a deploy that skipped a check, a runbook that didn't cover the case, an alerting blind spot, a hand-off that lost context. Five Whys forces you to pick a single chain through that mess; Fishbone lets you keep all the threads on the page. The map shows where the contributing causes overlap, which is where the highest-leverage fix lives.
Variations
- Drop categories you don't have anything for. A Fishbone with three bones is fine; padding empty bones with weak causes makes the diagram lie.
- Run it async — share the empty diagram, let the team add causes asynchronously over 24 hours, then converge live to discuss patterns. Better breadth than a 60-minute live build.
- Combine with Five Whys per branch — Fishbone for breadth, then Five Whys recursively into the densest two or three bones.
Facilitator notes
Fishbone with eight engineers in front of a whiteboard is a 90-minute exercise — scope the meeting honestly. The most common failure is treating the diagram as the output: a beautifully detailed Fishbone with no follow-up actions is a wall decoration. The output is two named commitments, not a finished diagram.
Pitfalls
- Using the original 6 Ms unchanged. Manpower and Mother Nature don't translate; the team will spend ten minutes deciding what bones mean before they can put causes on them.
- Treating the diagram as the deliverable. The diagram is the workings; the commitments are the output.
- Padding empty bones. A clean three-bone diagram is more useful than a six-bone diagram with weak causes invented to fill the page.
- Running it as your whole retrospective. It's a technique inside a post-incident retro, not a standalone format.
Remote tips
The diagram is the format and remote whiteboards make it work — pre-build the spine and the empty bones, share-screen as the team adds causes. Async pre-population for 24 hours before the live discussion is almost always worth it; people remember contributing causes overnight that wouldn't surface in a 60-minute live session.
Example outputs
- Problem (head): production was down 14:32-15:19 on Tuesday after the migration.
- Bone (Code): missing FK index, query plan changed, no statement timeout in the migration framework.
- Bone (Process): no canary deploy step for migrations, runbook didn't cover migration rollback.
- Bone (Comms): on-call wasn't paged for 8 minutes — alerting threshold was set too high.
- Bone (Tooling): migration framework defaults skip FK indexes, no perf check before deploy.
- Commitment 1: change migration framework default to create FK indexes. Owner: platform.
- Commitment 2: add canary step for migrations touching tables over 1M rows. Owner: SRE.
FAQ
- Fishbone or Five Whys?
- Five Whys when the cause is likely a single chain — a mechanical failure that traces through one path. Fishbone when there are likely many contributing causes, which is most software incidents. Five Whys gives you a chain; Fishbone gives you a map. If you started Five Whys and the chain branched at every level, switch to Fishbone.
- Can I run Fishbone as my whole retrospective?
- No. Fishbone is a structured cause-analysis technique used inside a post-incident retrospective on a specific incident. Run SSC or Sailboat as the retro itself, then drop into Fishbone when the team has identified an incident worth mapping. A 60-minute Fishbone with no preceding retrospective is a post-mortem with extra steps.
- Why localise the 6 Ms?
- Ishikawa's categories — Manpower, Method, Material, Machine, Measurement, Mother Nature — were built for 1960s manufacturing. They translate badly to software. A software team's contributing-cause categories look more like Code, Tooling, Process, People, Externals, Comms. Use the local categories or you'll spend the first ten minutes arguing about which bone owns 'flaky test.'