User Request: Dashboard in Datadog
Does someone have practice to create dashboard according to chaos.* metrics with application metrics in datadog dashboard? So that SRE can easily monitoring/compare the chaos injection with steady states? Thanks in advacne.
Hi. Yes, we can probably share the queries we're using for specific widgets. Is there anything in particular you'd like to visualize that you're having trouble with?
Hi. Yes, we can probably share the queries we're using for specific widgets. Is there anything in particular you'd like to visualize that you're having trouble with?
Hi Philip, my idea is quite simple now for pilot showcase.
-
I make a widget showing the application status, saying the http response code distribution either 2xx or >4xx, in bar chart over timeseries.
-
I want to make another widget where metrics from chaos controller over timeseries that show when my DisruptionCron/Disruption are injected. Then I put these two in one dashboard, so it would be a clear view of steady state vs. turbulence.
Thanks, Xiaopeng
@ptnapoleon Do you have some good idea of it? Thanks
Hi, so sorry about the delay, I forgot to get back to you.
I can't help with the first point, it's outside the scope of the project, and I'm not an expert on the best practices. For the latter,
we have the chaos.controller.validation.created metric, which you can filter by namespace and target to see when disruptions are created.
chaos.controller.disruptions.gauge with similar filtering can you show an ongoing count of disruptions
chaos.controller.pods.gauge will show you the live injector pods for any given disruption
These will all work for disruptions created directly or via disruptionCron.
The full list of metrics you can use are here: https://github.com/DataDog/chaos-controller/blob/main/docs/metrics_events.md
For specific help with the datadog dashboard product, you can check out the datadog's docs https://docs.datadoghq.com/ , the public slack at https://chat.datadoghq.com/ , or contact support