chaos-controller icon indicating copy to clipboard operation
chaos-controller copied to clipboard

User Request: Dashboard in Datadog

Open xpdable opened this issue 1 year ago • 2 comments

Does someone have practice to create dashboard according to chaos.* metrics with application metrics in datadog dashboard? So that SRE can easily monitoring/compare the chaos injection with steady states? Thanks in advacne.

xpdable avatar Mar 25 '24 09:03 xpdable

Hi. Yes, we can probably share the queries we're using for specific widgets. Is there anything in particular you'd like to visualize that you're having trouble with?

ptnapoleon avatar Mar 25 '24 17:03 ptnapoleon

Hi. Yes, we can probably share the queries we're using for specific widgets. Is there anything in particular you'd like to visualize that you're having trouble with?

Hi Philip, my idea is quite simple now for pilot showcase.

  1. I make a widget showing the application status, saying the http response code distribution either 2xx or >4xx, in bar chart over timeseries.

  2. I want to make another widget where metrics from chaos controller over timeseries that show when my DisruptionCron/Disruption are injected. Then I put these two in one dashboard, so it would be a clear view of steady state vs. turbulence.

Thanks, Xiaopeng

xpdable avatar Mar 26 '24 09:03 xpdable

@ptnapoleon Do you have some good idea of it? Thanks

xpdable avatar May 13 '24 06:05 xpdable

Hi, so sorry about the delay, I forgot to get back to you.

I can't help with the first point, it's outside the scope of the project, and I'm not an expert on the best practices. For the latter, we have the chaos.controller.validation.created metric, which you can filter by namespace and target to see when disruptions are created. chaos.controller.disruptions.gauge with similar filtering can you show an ongoing count of disruptions chaos.controller.pods.gauge will show you the live injector pods for any given disruption

These will all work for disruptions created directly or via disruptionCron.

The full list of metrics you can use are here: https://github.com/DataDog/chaos-controller/blob/main/docs/metrics_events.md

For specific help with the datadog dashboard product, you can check out the datadog's docs https://docs.datadoghq.com/ , the public slack at https://chat.datadoghq.com/ , or contact support

ptnapoleon avatar May 20 '24 20:05 ptnapoleon