Add Guidance wrt Labelling to Naming and Rules Best Practices
Add Guidance wrt Labelling to Naming and Rules Best Practices to docs/practices/naming.md and docs/practices/rules.md, specifically:
- The primary purposes of
jobandinstance - Include WARNINGS about accidentally stripping the
joblabel, especially in multi-tenant systems
This Fixes https://github.com/prometheus/docs/issues/2690
Friendly ping @SuperQ @beorn7 ?
Obligatory post-it note reminder: https://photos.app.goo.gl/Bkfir4wRiLtNVG4W8
With my current patchy availability, there is little chance I get to this anytime soon. Maybe @juliusv has a qualified opinion here?
Friendly ping @SuperQ @juliusv
Hey @conallob, congrats for the nice PR!
Just one thing: maybe you could fix the eaach typo in
- The
joblabel is a primary key to differentiate metrics from eaach other.
There is a small confusion I would love to see fixed, in a paragraph just below one of your edits. It is:
To keep the operations clean,
_sumis omitted if there are other operations, assum().
I don't understand the as sum() part. Like, "x is omitted if there are other operations such as x"? It doesn't make sense to me, in a very basic way. I know it's out of the scope of this PR, but maybe you could touch it to clarify.
Hey @conallob, congrats for the nice PR!
Just one thing: maybe you could fix the
eaachtypo in
- The
joblabel is a primary key to differentiate metrics from eaach other.There is a small confusion I would love to see fixed, in a paragraph just below one of your edits. It is:
Fixed the typo.
To keep the operations clean,
_sumis omitted if there are other operations, assum().I don't understand the
as sum()part. Like, "xis omitted if there are other operations such asx"? It doesn't make sense to me, in a very basic way. I know it's out of the scope of this PR, but maybe you could touch it to clarify.
I'm afraid that best practice is unrelated.
It also makes sense as written, once you've written enough rules. It's weighing up the trade-off between tracking the chain of operations across a pipeline of rules vs the rule name growing unwieldy. Many of these best practices trace back to specific philosophies from Prometheus' predecessor.
If you still think it needs a polish, please a separate doc bug.
Ping @juliusv , since @SuperQ is currently unavailable for life reasons
PTAL
Friendly ping?
Friendly, you're not at SRECon EMEA this week, ping?
For perspective, one of the motivations behind this PR is the anti-patterm of writing alert expressions intended for a single tenant system, which has evolved into a multi-tenant system.
e.g up{} for 5m without defining a job label works for one job.
Once you start adding additional jobs that match on the same labels (e.g Daemonsets, fleet-wide node_exporter, etc), teams start getting paged for systems they don't own or care about