agent
agent copied to clipboard
docs/user/operator/custom-resource-quickstart.md: wrong `up{job=…}` metrics
A bunch of Grafana Kubernetes Dashboards (like the default ones produced by kubernetes-monitoring/mixins
- rendered at https://github.com/monitoring-mixins/website/tree/master/assets/kubernetes/dashboards, use the following query to construct the $cluster
dashboard variable:
label_values(up{job="cadvisor"}, cluster)
Likewise, there's other queries using job="kubelet"
.
This means, it'd be convenient if the quickstart in docs/user/operator/custom-resource-quickstart.md
would provide the same job
labels that are expected there, to provide a nice out of the box experience.
However, things are a bit convoluted.
The up
metric seems to have the job=kubelet
label (this seems to come from the job name in the rendered scraping config), as the metricRelabelings
line only seems to apply to individual metrics, not the up
job describing the scrape job.
- action: replace
targetLabel: job
replacement: integrations/kubernetes/cadvisor
Also, as can be seen there, this should be set to cadvisor
, not integrations/kubernetes/cadvisor
.
It seems servicemonitors.monitoring.coreos.com.spec.jobLabel
might be the right attribute to set, to replace all job
labels in the returned metrics properly.
However, the description
The label to use to retrieve the job name from.
… doesn't really elaborate on which resource needs to be labelled to be able to define a custom job name.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
not stale
hey there @flokli thanks for surfacing this!
-
so this was originally written so that users could follow the quickstart, and have it work "out of the box" with the grafana cloud kubernetes integration (as a drop-in replacement for deploying agent manually using those generated configs & manifests). the integration uses
job=integrations/kubernetes/*
labels, so those are the one we provide in that guide. we probably should add a note instructing users on how to change this (for example to get this working with different selectors or OSS mixin, etc. "out of the box") -
great catch, yea it seems like
servicemonitors.monitoring.coreos.com.spec.jobLabel
is the label to set - this will default to using the service's name if this isn't provided or if the corresponding label name provided as thejobLabel
is not set on the service. since the service is created by operator (from what i recall) and its name iskubelet
, i think all the endpoints created will have ajob=kubelet
label set for theup
metric (and not sure how this interacts with the relabel configs....)
since we are setting a job=kubelet
label by default for all the cadvisor
and kubelet
endpoints (at least for the up
metrics), i will label this as a bug - need to see how prom operator solved this and dig a bit deeper here...
i should have some time to look into this shortly, but feel free to dig around in the meantime!!
based on the ordering, i think using a relabelings
instead of metricRelabelings
will solve this. i think it'll override the default kubelet
value
removing bug label for now, will be able to test shortly
@hjet I can verify it works.
@hjet I can verify it works.
awesome, thanks for verifying! this got away from me - have had a busy couple of weeks. will get to this soon but in the meantime anyone should feel free to put up a PR here
I took a stab at this in https://github.com/grafana/agent/pull/1810, PTAL.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
Not stale, waiting for https://github.com/grafana/agent/pull/1810 to be merged.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
Not stale.