agent docs/user/operator/custom-resource-quickstart.md: wrong `up{job=…}` metrics

A bunch of Grafana Kubernetes Dashboards (like the default ones produced by kubernetes-monitoring/mixins - rendered at https://github.com/monitoring-mixins/website/tree/master/assets/kubernetes/dashboards, use the following query to construct the $cluster dashboard variable:

label_values(up{job="cadvisor"}, cluster)

Likewise, there's other queries using job="kubelet".

This means, it'd be convenient if the quickstart in docs/user/operator/custom-resource-quickstart.md would provide the same job labels that are expected there, to provide a nice out of the box experience.

However, things are a bit convoluted.

The up metric seems to have the job=kubelet label (this seems to come from the job name in the rendered scraping config), as the metricRelabelings line only seems to apply to individual metrics, not the up job describing the scrape job.

    - action: replace
      targetLabel: job
      replacement: integrations/kubernetes/cadvisor

Also, as can be seen there, this should be set to cadvisor, not integrations/kubernetes/cadvisor.

Apr 13 '22 12:04 flokli

It seems servicemonitors.monitoring.coreos.com.spec.jobLabel might be the right attribute to set, to replace all job labels in the returned metrics properly.

However, the description

The label to use to retrieve the job name from.

… doesn't really elaborate on which resource needs to be labelled to be able to define a custom job name.

Apr 13 '22 12:04 flokli

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

May 14 '22 00:05 github-actions[bot]

not stale

May 14 '22 08:05 flokli

hey there @flokli thanks for surfacing this!

so this was originally written so that users could follow the quickstart, and have it work "out of the box" with the grafana cloud kubernetes integration (as a drop-in replacement for deploying agent manually using those generated configs & manifests). the integration uses job=integrations/kubernetes/* labels, so those are the one we provide in that guide. we probably should add a note instructing users on how to change this (for example to get this working with different selectors or OSS mixin, etc. "out of the box")
great catch, yea it seems like servicemonitors.monitoring.coreos.com.spec.jobLabel is the label to set - this will default to using the service's name if this isn't provided or if the corresponding label name provided as the jobLabel is not set on the service. since the service is created by operator (from what i recall) and its name is kubelet, i think all the endpoints created will have a job=kubelet label set for the up metric (and not sure how this interacts with the relabel configs....)

since we are setting a job=kubelet label by default for all the cadvisor and kubelet endpoints (at least for the up metrics), i will label this as a bug - need to see how prom operator solved this and dig a bit deeper here...

May 16 '22 18:05 hjet

i should have some time to look into this shortly, but feel free to dig around in the meantime!!

May 16 '22 18:05 hjet

based on the ordering, i think using a relabelings instead of metricRelabelings will solve this. i think it'll override the default kubelet value

May 16 '22 18:05 hjet

removing bug label for now, will be able to test shortly

May 16 '22 18:05 hjet

@hjet I can verify it works.

Jun 10 '22 19:06 Wouter0100

@hjet I can verify it works.

awesome, thanks for verifying! this got away from me - have had a busy couple of weeks. will get to this soon but in the meantime anyone should feel free to put up a PR here

Jun 15 '22 23:06 hjet

I took a stab at this in https://github.com/grafana/agent/pull/1810, PTAL.

Jun 21 '22 09:06 flokli

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

Jul 22 '22 00:07 github-actions[bot]

Not stale, waiting for https://github.com/grafana/agent/pull/1810 to be merged.

Jul 22 '22 07:07 flokli

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

Aug 22 '22 00:08 github-actions[bot]

Not stale.

Aug 22 '22 12:08 flokli

agent agent copied to clipboard

docs/user/operator/custom-resource-quickstart.md: wrong `up{job=…}` metrics

agent
agent copied to clipboard