helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[Tempo] - Overrides section changes

Open bmgante opened this issue 1 year ago • 17 comments
trafficstars

Hi, Wondering to upgrade to tempo 2.3.0 and noticed the following section has changed: image

The helm chart which is updated seems not to reflect this change, or am i wrong?

  overrides:
    {{- toYaml .Values.global_overrides | nindent 2 }}
    {{- if .Values.metricsGenerator.enabled }}
    metrics_generator_processors:
    {{- range .Values.global_overrides.metrics_generator_processors }}
    - {{ . }}
    {{- end }}
    {{- end }}

My values.yaml needs to be changed? It is currently as below

# Global overrides
global_overrides:
  per_tenant_override_config: /runtime-config/overrides.yaml
  metrics_generator_processors:
    - service-graphs
    - span-metrics

Thanks

bmgante avatar Nov 27 '23 18:11 bmgante

+1. At this moment the new style is not working using helm value. As a temporary solution - add manual section: values: |- overrides: | overrides: "*": ingestion_burst_size_bytes: 50000000 ingestion_rate_limit_bytes: 35000000

edgarkz avatar Dec 05 '23 14:12 edgarkz

What if changing the values yaml this way, would it work?

From:

  overrides:
    {{- toYaml .Values.global_overrides | nindent 2 }}
    {{- if .Values.metricsGenerator.enabled }}
    metrics_generator_processors:
    {{- range .Values.global_overrides.metrics_generator_processors }}
    - {{ . }}
    {{- end }}
    {{- end }}

...
global_overrides:
  per_tenant_override_config: /runtime-config/overrides.yaml
  #metrics_generator_processors: []
  metrics_generator_processors:
    - service-graphs
    - span-metrics

To:

  overrides:
  defaults:
    {{- toYaml .Values.global_overrides | nindent 2 }}
    {{- if .Values.metricsGenerator.enabled }}
    metrics_generator:
    {{- range .Values.global_overrides.metrics_generator.processors }}
    - {{ . }}
    {{- end }}
    {{- end }}

...
global_overrides:
  per_tenant_override_config: /runtime-config/overrides.yaml
  #metrics_generator_processors: []
  metrics_generator:
    processors:
      - service-graphs
      - span-metrics
      - local-blocks

bmgante avatar Dec 05 '23 19:12 bmgante

@edgarkz could you please elaborate a bit more on how to apply your manual workaround to have the helm chart working fine with 2.3.x tempo version and the new override syntax?

bmgante avatar Dec 06 '23 22:12 bmgante

Yeah we definitely need this, looks like https://github.com/grafana/helm-charts/pull/2825 is necessary. If you're like me and have some global overrides it may make sense to wait until this patch is released to upgrade to the chart versions that use 2.3. Losing some of those configs suddenly would have really hurt if I hadn't checked the Grafana update notes. Significant stuff like this should also be in the chart release notes, as this directly relates to the chart

AlexDCraig avatar Dec 08 '23 23:12 AlexDCraig

@edgarkz could you please elaborate a bit more on how to apply your manual workaround to have the helm chart working fine with 2.3.x tempo version and the new override syntax?

It doesn't work with new syntax.. I have added old syntax overrides to make it work in tempo 2.3 values: |- overrides: | overrides: "*":

edgarkz avatar Dec 13 '23 16:12 edgarkz

TL;DR

On the Helm Chart tempo-distributed instead of adding a parameter overrides, use global_overrides as a workaround:

global_overrides:
  defaults:
    ingestion:
      rate_limit_bytes: 32000000 # 32MB
      burst_size_bytes: 48000000 # 48MB
      max_traces_per_user: 50000

This is a workaround. The global_overrides isn't the right place but since the template outputs its content to the right final overrides block, you can use it by now.

Why the issue is happening?

The Grafana Tempo overrides documentation is correct! If your final yaml file ends up with a overrides block, it will work. Although, distributed-tempo Helm Chart is where the issue happens. Since version 1.8.0 of this Helm Chart it is using a new block called overrides that accepts a string. Even if you provide the proper string content it will fail simply because the Helm template isn't adding this block on the final yaml file.

Basically, the helm read the content of that block and add to a variable tempo.OverridesConfig (reference). Then it creates a ConfigMap that uses the content of this variable as a new file overrides.yaml (reference). And here comes the issue: when it generates the final yaml (tempo.yaml) it does not add the overrides block (reference). It lets its content on that external file called overrides.yaml.

Possible solutions

I think in this block the Helm Chart should not only add global_overrides but also overrides block as its content. This would output the overrides content into the final yaml file. Thus, solving the issue because Tempo binary will read the expected overrides block properly.

pantuza avatar Jun 11 '24 17:06 pantuza

At the moment I am unable to enable service graph, any tips on how we can do this?

How can I activate enable_virtual_node_label?

Screenshot 2024-07-21 at 11 09 30

batazor avatar Jul 21 '24 09:07 batazor

The global_overrides doesn't seem to work anymore (chart version 1.15.2). And tempo.structuredConfig can't be used either because of the invalid legacy format used for the multitenant config, even though I'm not using multitenancy. Couldn't find any way to make the metrics generator work with service graph.

ThomasVitale avatar Jul 29 '24 12:07 ThomasVitale

Below is part of my value file which makes the metrics generator work

  # Global overrides
  global_overrides:
    per_tenant_override_config: /runtime-config/overrides.yaml
    defaults:
      metrics_generator:
        processors: [service-graphs, span-metrics]


  # Per tenants overrides
  overrides: {}

chart version is 1.16.2

@ThomasVitale @batazor

chenlujjj avatar Aug 13 '24 07:08 chenlujjj

Another confused user here. After spending almost an hour reading everywhere, I think the intended usage was:

overrides:
    '*':
        metrics_generator:
            processors: ['service-graphs', 'span-metrics']

Helm chart version: grafana/tempo-distributed: 1.21.0

But as soon as config is applied, I'm getting rate limited errors on the Alloy.

2024-11-07T03:09:59.590772917Z stderr F ts=2024-11-07T03:09:59.590614967Z level=error msg="Exporting failed. Dropping data." component_path=/ component_id=otelcol.exporter.otlp.tempo error="not retryable error: Permanent error: rpc error: code = ResourceExhausted desc = RATE_LIMITED: ingestion rate limit (local: 0 bytes, global: 0 bytes) exceeded while adding 13541 bytes for user single-tenant" dropped_items=10

It seems that override config doesn't merge with the global config, so you have to reconfigure everything again which is crazy and error prone. I don't want to go down this rabbit hole and figure out which configs should be restored in the overrides:.

If my analysis are correct, you should not use overrides: config in the helm chart unless you know what you're doing. It's not doing overrides as it suggests, it's doing replacements. Opposite of how values.yaml works in helm chart. If I misunderstood the config, please correct me.

Fortunately this config worked for me. Service Graph is populated, Alloy no longer getting rate limited errors.

global_overrides:
    defaults:
        metrics_generator:
            processors: ['service-graphs', 'span-metrics']

@grafana can we improve this situation? overrides: config seems useless as it stands.

Related issues:

  • https://github.com/grafana/tempo/issues/3855
  • https://github.com/grafana/tempo/issues/3795
  • https://github.com/grafana/helm-charts/issues/3171
  • https://github.com/grafana/helm-charts/issues/3134
  • https://github.com/grafana/tempo/issues/3820

shinebayar-g avatar Nov 07 '24 03:11 shinebayar-g

Hi there! I noticed that we had a bunch of issues open for the overrides settings for the tempo-distributed Helm chart. We have two updates that add more doc for overrides:

  • https://github.com/grafana/helm-charts/pull/3468
  • https://github.com/grafana/tempo/pull/4415

Please let me know if this addresses the issue.

knylander-grafana avatar Dec 05 '24 22:12 knylander-grafana

I am struggling to understand how to use the metricsGenerator with tempo (not distributed) helm chart. From the docs and all the explanations it looks like I would need to have values like:

tempo:
  metricsGenerator:
    enabled: true
  global_overrides:
    metrics_generator_processors:
      - local-blocks

The issue is with https://github.com/grafana/helm-charts/blob/dfeecb93fff3057d8690c6c13cd2cbe62d08c55d/charts/tempo/values.yaml#L171-L174

I do not see any way extending metrics_generator_processors as it is being created on condition {{- if .Values.tempo.metricsGenerator.enabled }}

and the result of the config map has:

    overrides:
          metrics_generator_processors:
          - local-blocks
          per_tenant_override_config: /conf/overrides.yaml
          metrics_generator_processors:
          - 'service-graphs'
          - 'span-metrics'

Which breaks tempo.

lkiii avatar Jan 16 '25 10:01 lkiii

Same question as above, but for tempo-distributed. How is it possible to enable "local-blocks" and set its parameter (e.g. filter_server_spans: false)? Also, disable other processors. Thank you.

upd: this works for me. Assuming, you have in your Chart.yaml:

- alias: tempo
  condition: tempo.enabled
  name: tempo-distributed
  repository: https://grafana.github.io/helm-charts
  version: ^1.9.2                                                  

So, values.yaml (noticed, i bumped the version of Tempo, switch on grpc, increased ingester replicas):

tempo:
  enabled: true

  tempo:
    image:
      tag: 2.6.1

  ingester:
    replicas: 2

  traces:
    otlp:
      grpc:
        enabled: true

  metricsGenerator:
    enabled: true
    config:
      storage:
        remote_write:
          - url: http://lgtm-mimir-nginx/api/v1/push  # URL of locally running Mimir instance.
            send_exemplars: true # Send exemplars along with their metrics.
      processor:
        local_blocks:
          filter_server_spans: false
          flush_to_storage: true

  global_overrides:
      metrics_generator_processors:
        - local-blocks

skorzhevsky avatar Jan 16 '25 12:01 skorzhevsky

for tempo-distributed i use following:

global_overrides:
   metrics_generator_processors:
       - service-graphs
       - local-blocks

which gets expanded to following in configmap:

overrides:
  metrics_generator_processors:
  - service-graphs
  - local-blocks
  per_tenant_override_config: /runtime-config/overrides.yaml

this works for me for servicemaps, but i still get deprecation warning

rlex avatar Jan 16 '25 14:01 rlex

At the moment I am unable to enable service graph, any tips on how we can do this?

How can I activate enable_virtual_node_label?

Screenshot 2024-07-21 at 11 09 30

Can anyone help?

bobafettfrom avatar Feb 05 '25 07:02 bobafettfrom

Apparently global_overrides => overrides, the readme in main looks wrong as it still mentions global_overrides. When I set overrides in the latest chart 1.33.0 and attempt to upgrade I just get incessant crashlooping and errors around the overrides.LegacyOverrides. Not sure what the deal is but now nothing seems to be working correctly. Dropping back to previous chart.

overrides:
  defaults:
    ingestion:
      rate_strategy: local
      burst_size_bytes: 200000000
      rate_limit_bytes: 350000000
      max_bytes_per_tag_values_query: 10000000
      max_bytes_per_trace: 200000000
      max_traces_per_user: 3000000  
    metrics_generator:
      forwarder_queue_size: 100000
      forwarder_workers: 5
      processors:
        - service-graphs
        - span-metrics
        - local-blocks
level=warn ts=2025-04-03T19:50:42.299352049Z caller=main.go:133 msg="-- CONFIGURATION WARNINGS --"
level=warn ts=2025-04-03T19:50:42.299396478Z caller=main.go:139 msg="c.StorageConfig.Trace.Block.Version != \"v2\" but v2_in_buffer_bytes is set" explain="This setting is only used in v2 blocks"
level=warn ts=2025-04-03T19:50:42.299404618Z caller=main.go:139 msg="c.StorageConfig.Trace.Block.Version != \"v2\" but v2_out_buffer_bytes is set" explain="This setting is only used in v2 blocks"
level=warn ts=2025-04-03T19:50:42.299409858Z caller=main.go:139 msg="c.StorageConfig.Trace.Block.Version != \"v2\" but v2_prefetch_traces_count is set" explain="This setting is only used in v2 blocks"
level=error ts=2025-04-03T19:50:42.394921997Z caller=app.go:223 msg="module failed" module=overrides err="starting module overrides: invalid service state: Failed, expected: Running, failure: failed to start subservices: not healthy, 0 terminated, 1 failed: [failed to load runtime config: load file: yaml: unmarshal errors:\n  line 3: field ingestion not found in type overrides.LegacyOverrides\n  line 10: field metrics_generator not found in type overrides.LegacyOverrides]"
level=error ts=2025-04-03T19:50:42.395067187Z caller=app.go:223 msg="module failed" module=metrics-generator err="failed to start metrics-generator, because it depends on module overrides, which has failed: invalid service state: Failed, expected: Running, failure: starting module overrides: invalid service state: Failed, expected: Running, failure: failed to start subservices: not healthy, 0 terminated, 1 failed: [failed to load runtime config: load file: yaml: unmarshal errors:\n  line 3: field ingestion not found in type overrides.LegacyOverrides\n  line 10: field metrics_generator not found in type overrides.LegacyOverrides]"
level=warn ts=2025-04-03T19:50:42.395236867Z caller=module_service.go:118 msg="module failed with error" module=usage-report err="context canceled"
level=error ts=2025-04-03T19:50:42.395362337Z caller=memberlist_client.go:731 msg="failed to resolve members" addrs=dns+tempo-gossip-ring:7946 err="lookup IP addresses \"tempo-gossip-ring\": lookup tempo-gossip-ring: operation was canceled"```

jlcrow avatar Apr 03 '25 20:04 jlcrow

What about dimensions as well? getting same legacy error as above I had this config on helm chart:

      overrides:
        default:
          metrics_generator:
            processor:
              span_metrics:
                  dimensions:
                    - http.method
                    - http.target
                    - http.status_code
                    - service.version
              service_graphs:
                  dimensions:
                    - http.method
                    - http.target
                    - http.status_code
                    - service.version

but not sure where to put them now

xakaitetoia avatar Apr 10 '25 17:04 xakaitetoia

Does this doc help? https://grafana.com/docs/helm-charts/tempo-distributed/next/get-started-helm-charts/#optional-use-global-or-per-tenant-overrides (update to the docs was from this PR, which document changes from this PR)

We should make sure and update the readme.

knylander-grafana avatar Aug 07 '25 14:08 knylander-grafana

Does this doc help? https://grafana.com/docs/helm-charts/tempo-distributed/next/get-started-helm-charts/#optional-use-global-or-per-tenant-overrides (update to the docs was from this PR, which document changes from this PR)

We should make sure and update the readme.

it does not help, the issue being reported by @xakaitetoia (which i am experiencing too) seems to be that any overrides.defaults.metrics_generator.processor entry in the values causes the resulting pod to spin with a legacyConfig error:

│ failed parsing config: failed to parse configFile /conf/tempo.yaml: yaml: unmarshal errors:                                                                │
│   line 47: field defaults not found in type overrides.legacyConfig
resource "helm_release" "tempo" {
  name       = "tempo"
  namespace  = local.monitor_ns
  repository = "https://grafana.github.io/helm-charts"
  chart      = "tempo"
  version    = local.monitor_tempo_version # "1.23.2"

  values = [
    yamlencode({
      persistence = {
        enabled = true
        size    = "20Gi"
      }

      tempo = {
        # 14d
        retention = "336h"

        metricsGenerator = {
          enabled        = true
          remoteWriteUrl = "http://kube-prometheus-stack-prometheus:${local.monitor_prometheus_port}/api/v1/write"
        }

        queryFrontend = {
          metrics = {
            max_duration         = "168h"
            concurrent_jobs      = 32
            target_bytes_per_job = 1250000000 # ~1.25GB
          }
        }

        overrides = {
          defaults = {
            metrics_generator = {
              processors = ["service-graphs", "span-metrics", "local-blocks"]
              # adding this causes the error, without it is fine
              # processor = {
              #   local_blocks = {
              #     filter_server_spans = false
              #     flush_to_storage = true
              #   }
              # }
            }
          }
        }
      }
    })
  ]
}

relevant generated configmap:

│ overrides:                                                                                                                                                 │
│       defaults:                                                                                                                                            │
│         metrics_generator:                                                                                                                                 │
│           processor:                                                                                                                                       │
│             local_blocks:                                                                                                                                  │
│               filter_server_spans: false                                                                                                                   │
│               flush_to_storage: true                                                                                                                       │
│           processors:                                                                                                                                      │
│           - service-graphs                                                                                                                                 │
│           - span-metrics                                                                                                                                   │
│           - local-blocks                                                                                                                                   │
│       per_tenant_override_config: /conf/overrides.yaml                                                                                                     │
│ metrics_generator:                                                                                                                                         │
│       storage:                                                                                                                                             │
│         path: "/tmp/tempo"                                                                                                                                 │
│         remote_write:                                                                                                                                      │
│           - url: http://kube-prometheus-stack-prometheus:9090/api/v1/write                                                                                 │
│       traces_storage:                                                                                                                                      │
│         path: "/tmp/traces"
│         path: "/tmp/traces"

ntr-switchdin avatar Sep 08 '25 06:09 ntr-switchdin

I gave up on upgrading Tempo, last attempt to upgrade left all pods in a state where they never start, never got beyond the config changes, had to restore the entire namespace from backup.

jlcrow avatar Sep 08 '25 12:09 jlcrow

I wasted a couple of hours on trying to configure options for the local-blocks processor for the monolithic helm chart. I think it's just in a broken state right now, because, as mentioned in #3640, it is not possible to configure processor options such as

 local_blocks:
    filter_server_spans: false
    flush_to_storage: true

in the tempo.metricsGenerator section of the values.yaml

And looking at the docs for the overrides section (https://grafana.com/docs/tempo/next/configuration/#standard-overrides), these are the only accpeted values (and don't appear to cause crashes):

        local-blocks:
          [max_live_traces: <int>]
          [max_block_duration: <duration>]
          [max_block_bytes: <int>]
          [flush_check_period: <duration>]
          [trace_idle_period: <duration>]
          [complete_block_timeout: <duration>]
          [concurrent_blocks: <duration>]
          [filter_server_spans: <bool>]

So if my understanding of this mess is correct, there is no place to configure e.g. the following without errors

        local_blocks:
            block: <Block config>
            search: <Search config>
            [filter_server_spans: <bool> | default = true]
            [flush_to_storage: <bool> | default = false]
            [time_overlap_cutoff: <float64> | default = 0.2]

osagemo avatar Oct 10 '25 08:10 osagemo

In terms of tempo-distributed latest helm chart version (1.48.0 at this moment) - the global_overrides or overrides does not work in any way : you'll get error on containers start or config will just be ignored and not applied.

But good news that per_tenant_overrides are working. In order to apply per_tenant_overrides globally just use wildcard:

per_tenant_overrides: 
  "*":
    ingester:
      max_block_bytes: 524288000
      

vpsgetcom avatar Oct 10 '25 19:10 vpsgetcom