web icon indicating copy to clipboard operation
web copied to clipboard

Add Monitors in Datadog for Pager Duty

Open nfrgosselin opened this issue 2 years ago • 0 comments

Acceptance Criteria Add Monitors for the following (this list is WIP):

  • [x] High number of 500s

  • [ ] High Number of instances (indicates Autoscaling is working, but consuming too many resources)

  • [x] High Latency on Page Load (indicates overall site performance degradation)

  • [ ] High number of jobs enqueued in Redis (indicates celery workers aren't keeping up with demand)

  • [x] Synthetic pageload tests failing (canary uptime test)

  • [ ] Ensure Read Replica DBs are also monitored

  • [x] High Mem on DBs

  • [x] High CPU on DBs (DB load is a known bottleneck, may need to find and kill long running queries)

  • [x] High Latency on DB Querries (indicates inefficient queries, or high db load)

  • [x] High CPU on cluster (indicates Autoscaling is lagging behind demand)

XD Links:

Tech Details:

Open Questions:

Notes/Assumptions:

nfrgosselin avatar Jul 28 '22 18:07 nfrgosselin