docs icon indicating copy to clipboard operation
docs copied to clipboard

Add Guidance wrt Labelling to Naming and Rules Best Practices

Open conallob opened this issue 5 months ago • 11 comments

Add Guidance wrt Labelling to Naming and Rules Best Practices to docs/practices/naming.md and docs/practices/rules.md, specifically:

  • The primary purposes of job and instance
  • Include WARNINGS about accidentally stripping the job label, especially in multi-tenant systems

This Fixes https://github.com/prometheus/docs/issues/2690

conallob avatar Jul 15 '25 12:07 conallob

Friendly ping @SuperQ @beorn7 ?

conallob avatar Jul 28 '25 22:07 conallob

Obligatory post-it note reminder: https://photos.app.goo.gl/Bkfir4wRiLtNVG4W8

conallob avatar Jul 28 '25 22:07 conallob

With my current patchy availability, there is little chance I get to this anytime soon. Maybe @juliusv has a qualified opinion here?

beorn7 avatar Jul 29 '25 10:07 beorn7

Friendly ping @SuperQ @juliusv

conallob avatar Aug 03 '25 23:08 conallob

Hey @conallob, congrats for the nice PR!

Just one thing: maybe you could fix the eaach typo in

  • The job label is a primary key to differentiate metrics from eaach other.

There is a small confusion I would love to see fixed, in a paragraph just below one of your edits. It is:

To keep the operations clean, _sum is omitted if there are other operations, as sum().

I don't understand the as sum() part. Like, "x is omitted if there are other operations such as x"? It doesn't make sense to me, in a very basic way. I know it's out of the scope of this PR, but maybe you could touch it to clarify.

andrechalella avatar Aug 13 '25 15:08 andrechalella

Hey @conallob, congrats for the nice PR!

Just one thing: maybe you could fix the eaach typo in

  • The job label is a primary key to differentiate metrics from eaach other.

There is a small confusion I would love to see fixed, in a paragraph just below one of your edits. It is:

Fixed the typo.

To keep the operations clean, _sum is omitted if there are other operations, as sum().

I don't understand the as sum() part. Like, "x is omitted if there are other operations such as x"? It doesn't make sense to me, in a very basic way. I know it's out of the scope of this PR, but maybe you could touch it to clarify.

I'm afraid that best practice is unrelated.

It also makes sense as written, once you've written enough rules. It's weighing up the trade-off between tracking the chain of operations across a pipeline of rules vs the rule name growing unwieldy. Many of these best practices trace back to specific philosophies from Prometheus' predecessor.

If you still think it needs a polish, please a separate doc bug.

conallob avatar Aug 14 '25 11:08 conallob

Ping @juliusv , since @SuperQ is currently unavailable for life reasons

conallob avatar Aug 14 '25 11:08 conallob

PTAL

conallob avatar Aug 17 '25 19:08 conallob

Friendly ping?

conallob avatar Aug 21 '25 21:08 conallob

Friendly, you're not at SRECon EMEA this week, ping?

conallob avatar Oct 06 '25 15:10 conallob

For perspective, one of the motivations behind this PR is the anti-patterm of writing alert expressions intended for a single tenant system, which has evolved into a multi-tenant system.

e.g up{} for 5m without defining a job label works for one job.

Once you start adding additional jobs that match on the same labels (e.g Daemonsets, fleet-wide node_exporter, etc), teams start getting paged for systems they don't own or care about

conallob avatar Oct 07 '25 16:10 conallob