tempo icon indicating copy to clipboard operation
tempo copied to clipboard

Backend response duration is too high

Open tcaty opened this issue 2 years ago • 2 comments

Describe the bug Hi! We use Grafana Tempo in our team. We faced the issue recently that simple query takes so much time. How can we tune Tempo performance to see traces immediately?

Query:

{resource.service.name="${service_name}" && resource.env="${env}" && 400 <= .http.status_code && .http.status_code != 404 && .http.status_code < 600}

Response time:

image

To Reproduce Steps to reproduce the behavior:

  1. Start chart tempo-operational v1.7.1
  2. Perform Operations (Read).
  3. Wait too long

Expected behavior See traces immediately

Environment:

Additional Context We use tempo-operational dashboard to monitor our Tempo instance. And there are what we see on screenshots below:

image We think that there is the problem in Querier component directly. So we gave him a lot of resources, but it still works slowly.

querier:
  replicas: 2
  resources:
    requests:
      cpu: 2
      memory: 2Gi
    limits:
      cpu: 8
      memory: 10Gi

How can we boost perfomance?

tcaty avatar Feb 22 '24 13:02 tcaty

There's lot of ways to improve the perf of TraceQL! Listed in the order I think you should consider them:

  1. An instant big win would be to add scopes to all of your attributes:
{resource.service.name="${service_name}" && resource.env="${env}" && 400 <= span.http.status_code && span.http.status_code != 404 && span.http.status_code < 600}
  1. Set up GRPC Streaming https://grafana.com/docs/tempo/latest/api_docs/#tempo-grpc-api This also (currently) requires setting a Grafana feature flag

  2. Configure dedicated columns: docs blogpost

  3. Use multiple caching layers which are added in 2.4: https://grafana.com/docs/tempo/next/configuration/#cache

  4. Search perf configurables This advice is a bit out of date, and only applies once you start scaling Tempo quite large. I would ignore the serverless parts (we have had issues getting good perf), but the major tunables discussion is still correct. If you are running 50+ queriers I would start to care about this. https://grafana.com/docs/tempo/latest/operations/backend_search/

joe-elliott avatar Feb 22 '24 15:02 joe-elliott

@joe-elliott thank you for your reply! I'll try it on next week and give feedback about what really helped us!

tcaty avatar Feb 24 '24 10:02 tcaty

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

github-actions[bot] avatar Apr 25 '24 00:04 github-actions[bot]

I apologize for such a long duration, so there is our feedback. We've used your 1, 3, and 4 advices and I can certainly say that their order by value in terms of performance is absolutely right for us. We've noticed good changes immediately by optimizing our TraceQL queries. The second one has helped a lot as well. We drop the most heavy and useless attributes in our collector and there are some results: our storage has been filling up more slower and tempo search has been working more faster since these changes have been made. And the fourth one has made the most minor improvements, anyway it's better than nothing :) Thank you again, @joe-elliott!

tcaty avatar May 02 '24 19:05 tcaty