docs/user/operator/custom-resource-quickstart.md: wrong `up{job=…}` metrics
A bunch of Grafana Kubernetes Dashboards (like the default ones produced by kubernetes-monitoring/mixins - rendered at https://github.com/monitoring-mixins/website/tree/master/assets/kubernetes/dashboards, use the following query to construct the $cluster dashboard variable:
label_values(up{job="cadvisor"}, cluster)
Likewise, there's other queries using job="kubelet".
This means, it'd be convenient if the quickstart in docs/user/operator/custom-resource-quickstart.md would provide the same job labels that are expected there, to provide a nice out of the box experience.
However, things are a bit convoluted.
The up metric seems to have the job=kubelet label (this seems to come from the job name in the rendered scraping config), as the metricRelabelings line only seems to apply to individual metrics, not the up job describing the scrape job.
- action: replace
targetLabel: job
replacement: integrations/kubernetes/cadvisor
Also, as can be seen there, this should be set to cadvisor, not integrations/kubernetes/cadvisor.
It seems servicemonitors.monitoring.coreos.com.spec.jobLabel might be the right attribute to set, to replace all job labels in the returned metrics properly.
However, the description
The label to use to retrieve the job name from.
… doesn't really elaborate on which resource needs to be labelled to be able to define a custom job name.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
not stale
hey there @flokli thanks for surfacing this!
-
so this was originally written so that users could follow the quickstart, and have it work "out of the box" with the grafana cloud kubernetes integration (as a drop-in replacement for deploying agent manually using those generated configs & manifests). the integration uses
job=integrations/kubernetes/*labels, so those are the one we provide in that guide. we probably should add a note instructing users on how to change this (for example to get this working with different selectors or OSS mixin, etc. "out of the box") -
great catch, yea it seems like
servicemonitors.monitoring.coreos.com.spec.jobLabelis the label to set - this will default to using the service's name if this isn't provided or if the corresponding label name provided as thejobLabelis not set on the service. since the service is created by operator (from what i recall) and its name iskubelet, i think all the endpoints created will have ajob=kubeletlabel set for theupmetric (and not sure how this interacts with the relabel configs....)
since we are setting a job=kubelet label by default for all the cadvisor and kubelet endpoints (at least for the up metrics), i will label this as a bug - need to see how prom operator solved this and dig a bit deeper here...
i should have some time to look into this shortly, but feel free to dig around in the meantime!!
based on the ordering, i think using a relabelings instead of metricRelabelings will solve this. i think it'll override the default kubelet value
removing bug label for now, will be able to test shortly
@hjet I can verify it works.
@hjet I can verify it works.
awesome, thanks for verifying! this got away from me - have had a busy couple of weeks. will get to this soon but in the meantime anyone should feel free to put up a PR here
I took a stab at this in https://github.com/grafana/agent/pull/1810, PTAL.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
Not stale, waiting for https://github.com/grafana/agent/pull/1810 to be merged.
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!
Not stale.