client_golang icon indicating copy to clipboard operation
client_golang copied to clipboard

Consider (optionally?) enforcing certain metric naming conventions

Open bwplotka opened this issue 5 years ago • 5 comments

Hi,

I think it makes sense to have this verification on client-side in official Prometheus library - some rules are must-have like _total. I guess the problem is how to error... probably panic is the only option?

Alternatively, we can implement it on our (e.g Thanos side). Plus we can be even more opinionated. Some wrapper that will panic if:

  • NewCounter is invoked without _total suffix
  • NewTimeHistogram invoked without _seconds and NewSizeHistogram invoked without _bytes (we only have two types).

Alternatively (3rd) NewCounter could always add _total if not specified, but that's surprising. Sometimes we discover metrics from code.

Use case: 10000 bugs where we forgot to ensure _total of counter suffix in PR review time...

bwplotka avatar Mar 12 '20 07:03 bwplotka

It is perfectly valid to have a counter without _total in the Prometheus exposition format, though not encouraged.

brian-brazil avatar Mar 12 '20 08:03 brian-brazil

There is already some validation of metric names. And there is an established way of reporting errors. (They are reported during registration time of a metric, not during construction time. This might not be the most intuitive way, but error handling has organically grown here. It can be rethought for v2, but right now, we cannot really change it in a breaking way.)

Having said that, because the _total suffix is just a convention, not a requirement, introducing the requirement now would be a breaking change.

Things are a bit different on OpenMetrics. For that, there is #683 (which I still have to complete with my thoughts how to implement this without breaking a lot of current users, but input of others is welcome).

For v2, we can go wild, of course, and we could elect to be more restrictive for naming. For the current world order, I think the approach with promlint is the right way to test for legal but discouraged metric names.

I'd suggest to move this issue to the v2 milestone as it cannot really be implemented within v1. And I'd also word it as "Consider to…" as it still needs to be discussed if we want to be more restrictive in naming than what the exposition format allows. (Even OpenMetrics doesn't enforce seconds in the metric name of a histogram of durations.)

Note that there is also #426 which deals with formal validations that are currently not even possible during registration time (but will happen during scrape time).

beorn7 avatar Mar 12 '20 13:03 beorn7

I adjusted the issue according to my suggestions above.

I guess if something is just a convention, it would be bad if an instrumentation library enforce it. But perhaps we can have an option somewhere in v2 where a user can ask the library to check/enforce certain conventions.

We don't have to decide that now. v2 needs more fundamental design decisions first. (I wanted to turn my Gophercon EU talk into a straw poll about opinions in the Go community, but oh well, the conference got postponed…)

beorn7 avatar Mar 13 '20 14:03 beorn7

Somehow related: https://github.com/thanos-io/thanos/issues/2430

bwplotka avatar Apr 21 '20 06:04 bwplotka