smokeping_prober icon indicating copy to clipboard operation
smokeping_prober copied to clipboard

Rework dashboard

Open SuperQ opened this issue 1 year ago • 9 comments

Rework the dashbaord to be more useful.

  • Use rate() where needed to get correct results.
  • Add support for native histograms.
  • Improve dashboard variables.

SuperQ avatar Apr 23 '24 07:04 SuperQ

Fixes: https://github.com/SuperQ/smokeping_prober/issues/150

SuperQ avatar Apr 23 '24 07:04 SuperQ

Fixes: https://github.com/SuperQ/smokeping_prober/issues/100

SuperQ avatar Apr 23 '24 07:04 SuperQ

Fixes: https://github.com/SuperQ/smokeping_prober/issues/90

SuperQ avatar Apr 23 '24 07:04 SuperQ

Add support for native histograms.

This seems to be breaking the dashboard for people who aren't using native histograms. I'm getting this for the Average Latency graph:

Status: 500. Message: bad_data: invalid parameter "query": 1:1: parse error: unknown function with name "histogram_avg"

The new dashboard doesn't seem to break out multiple ping targets into their own panels anymore. This was useful to compare hosts and check if they behaved differently, e.g. due to routing. Being able to look at the sum of all hosts (by setting host and ip to all) is definitely useful, though.

And I can see how breaking them out would be bad if someone had dozens of targets. I'm not well-versed in Grafana; is there a way to add a checkbox that toggles this behavior?

dominikh avatar Apr 23 '24 11:04 dominikh

What version of Prometheus do you have?

SuperQ avatar Apr 23 '24 11:04 SuperQ

I can add the row configuration back in.

SuperQ avatar Apr 23 '24 11:04 SuperQ

What version of Prometheus do you have?

I'm on version 2.47.2. histogram_avg seems to have been added in 2.51.0, which only released in March 2024. Even then, the function is documented as

This function only acts on native histograms, which are an experimental feature.

and most users probably have their data in classic histograms, not native ones.

dominikh avatar Apr 23 '24 11:04 dominikh

Yes, and that's why there is an or in the query now. If the native histogram doesn't return data, it will use the classic histogram data.

SuperQ avatar Apr 23 '24 12:04 SuperQ

Hi,

I also checked the reworked dashboard and it looks like the or is missing in the 3rd panel Average Latency. This results in no data for if native histograms are not enabled.

bboehmke avatar May 11 '24 18:05 bboehmke