tempo icon indicating copy to clipboard operation
tempo copied to clipboard

2x memory increase after upgrade

Open anjmao opened this issue 2 years ago • 4 comments

Describe the bug After upgrading from tempo grafana/tempo:1e8583d9 to grafana/tempo:1.4.1 memory usage increased from 4Gib to 8Gib. CPU usage also increased. Screenshot 2022-06-27 at 22 27 33

To Reproduce Steps to reproduce the behavior: Helm chart version grafana/tempo 0.15.4 (Grafana Tempo Single Binary Mode)

Helm values

fullnameOverride: tempo
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "3100"
  prometheus.io/path: "/metrics"
tempo:
  retention: 48h
  storage:
    trace:
      backend: local
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      memory: 8Gi
persistence:
  enabled: true
  accessModes:
    - ReadWriteOnce
  size: 80Gi

Expected behavior No memory and cpu increase after upgrade.

Environment:

  • Infrastructure: Kubernetes 1.22
  • Deployment tool: helm

Additional Context Nothing specific, sometimes I see errors:

rpc error: code = FailedPrecondition desc = LIVE_TRACES_EXCEEDED: max live traces exceeded for tenant single-tenant: per-user traces limit (local: 10000 global: 0 actual local: 10000) exceeded

Will need to increase live traces limit, but I was hitting this limits with old version too.

anjmao avatar Jun 27 '22 20:06 anjmao

The previous version of Tempo you were running is all the way back from December 2020: https://github.com/grafana/tempo/commit/1e8583d9a108496a35c235bb6a95ede860aff5b9. Since this version is from 1,5 years ago, it's hard to tell what changes increased memory and cpu usage. Did any of your config change?

kvrhdn avatar Jun 28 '22 11:06 kvrhdn

I compared configs for old vs new helm chart. I'm using default values.

helm template test grafana/tempo --version=0.15.4
helm template test grafana/tempo --version=0.7.1

Seems on both versions tempo.yaml is almost the same. Probably some default configs changed on tempo itself or there is some regression.

anjmao avatar Jun 29 '22 13:06 anjmao

I believe Tempo Search is enabled by default in the new version. Could you please check that -- Are you able to search for traces in the Grafana UI? The elevated memory usage might be related to that.

annanay25 avatar Jun 29 '22 13:06 annanay25

I think I found the issue here https://github.com/grafana/helm-charts/blob/main/charts/tempo/values.yaml#L106

After changed to block_retention field it went back to normal. Setting compacted_block_retention is set as 1h by default on tempo.

anjmao avatar Jul 01 '22 15:07 anjmao

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

github-actions[bot] avatar Nov 14 '22 00:11 github-actions[bot]