celo-blockchain icon indicating copy to clipboard operation
celo-blockchain copied to clipboard

Develop an on-call onboarding exercise to let people learn by doing

Open piersy opened this issue 3 years ago • 1 comments

E.G:

  • get these metrics for this group of clusters
  • make a dashboard showing X
  • get the logs of transaction X
  • redeploy a cluster in Forno

piersy avatar Nov 30 '21 18:11 piersy

Some notes/thoughts thanks @lvpeschke

Nothing better than hands on experience - lots of shadowing and reverse shadowing.

Wheels of misfortune! (incident simulations, pretend an incident is happening and practice all aspects of incident management, have a debrief).

The wheel creator looks at past incidents and designs a new incident, take a past incident hide the resolution from the team. Get the on call team in one room and then the incident creator communicates the problems as they happen.

The creator can share their screen/screenshot and ask what action needs to be taken. The contestant then shares their screen and shows exactly the actions they would take. Great way for people to learn and share hacks.

Takes about 2hrs to prepare 1hr of wheel of misfortune. Great if you have tools that allow you to roll back time and see all the actual alerts/data.

Keep a log of all actions Keep a log of all things that did not go well.

piersy avatar Dec 09 '21 11:12 piersy