Add observability-and-monitoring skill
Summary
Adds observability-and-monitoring — the one confirmed lifecycle gap in the collection (no existing PR, not covered by any current skill). Covers the full production visibility workflow: structured logging, four golden signals metrics, OpenTelemetry distributed tracing, SLO-based alerting, and dashboard design.
Changes
- New skill:
skills/observability-and-monitoring/SKILL.md - New reference:
references/observability-checklist.md— logging, metrics, tracing, SLOs/alerting, dashboards, pre-launch gate incremental-implementation: one-liner anchoring observability to slice definition ("define what you'll measure when you define the slice")using-agent-skills: decision tree, lifecycle (step 11, before shipping), quick reference tableidea-refine: forward reference tospec-driven-developmentdeprecation-and-migration: references toincremental-implementation,test-driven-development,git-workflow-and-versioning- README: skill count 20 → 21, new entry in Ship section and directory tree
Review feedback addressed (from #59)
| Feedback | Fix |
|---|---|
| Decision tree trigger is reactive | Reworded to "Instrumenting a service / setting up monitoring?" |
| Observability only appears at step 11 | Added anchor in incremental-implementation — define measurements at slice definition time |
No references/observability-checklist.md |
Created — consistent with security-checklist.md, testing-patterns.md, performance-checklist.md |
| Steps assume greenfield | Added Step 0: brownfield audit — inventory existing instrumentation before adding more |
Reviews run before submission
This branch went through /codex:adversarial-review and /codex:review before the original PR #59. Both findings were fixed (metric cardinality labelling, OTel propagation, /ship command wiring, lifecycle ordering). This PR is the observability-only split of that work.
Implementation complete (parallel build).
- Commit: 90ae4dc
- Files changed: 8 (shared schema, Mongoose model, service, routes, 2 test files + 2 index updates)
- Tests: 372 passing (shared: 203, api: 144, web: 25)
- Merged with: Task 24 (parallel group: api-late)
Key points:
- Version numbers auto-increment per document starting at 1
- Diff absent on v1, computed via line-diff from v2 onward (no extra deps)
- Restore creates a NEW version (history is immutable)
- Editors can snapshot/restore; viewers get 403
→ Ready for dev review.
Rebased on main after v0.6.0 merge. Key conflict resolved: /ship was completely rewritten as a fan-out orchestrator — integrated the observability step into Phase B (step 6) rather than the old numbered checklist. README skill count updated to 22.