tempo icon indicating copy to clipboard operation
tempo copied to clipboard

Seems the dashboards to monitor tempo are broken

Open chenlujjj opened this issue 1 year ago • 6 comments

Describe the bug

Hi team, I deployed tempo-distributed in k8s cluster and tried to monitor it with the dashboards here. But I found that the dashboards are broken, for example:

  • there is no cluster label in tempo_build_info metric, which causes the dashboard variable cluster no value at all
  • there is no tempo_receiver_accepted_spans metric from the distributor, but it is used in the distributor monitor panel

Expected behavior

The dashboards should be normal and show metrics well

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Additional Context

chenlujjj avatar Aug 22 '24 08:08 chenlujjj

Found a PR: https://github.com/grafana/tempo/pull/3917 may be related to the tempo_receiver_accepted_spans metric, I'll try to upgrade my tempo deploy

chenlujjj avatar Aug 22 '24 08:08 chenlujjj

Hi, the tempo_receiver_accepted_spans will be available in the 2.6.0 release and then it will need to be updated in the helm chart. For the tempo_build_info metric, we use the same dashboards so it makes sense for us, maybe @zalegrala knows more.

javiermolinar avatar Aug 22 '24 09:08 javiermolinar

Thanks @javiermolinar

Does the tempo_build_info metric in your stack have cluster label? Below is what I get from one of the temp distributor instances: image

chenlujjj avatar Aug 22 '24 10:08 chenlujjj

Here is where is populated: https://github.com/grafana/tempo/blob/fbf249a41fdc9ee9ddc8168c4a4f92e426f92bb0/cmd/tempo/build/build.go#L20

The cluster label is probably added in the K8s relabel configuration. That way all our metrics include the cluster info

javiermolinar avatar Aug 22 '24 11:08 javiermolinar

Got it!

chenlujjj avatar Aug 22 '24 11:08 chenlujjj

That's right. Add a cluster and namespace label in the scrape configs. This should mean the queries in the dashboard work as intended.

zalegrala avatar Aug 22 '24 17:08 zalegrala

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

github-actions[bot] avatar Oct 22 '24 00:10 github-actions[bot]