helm-charts
helm-charts copied to clipboard
[Tempo] - Overrides section changes
Hi,
Wondering to upgrade to tempo 2.3.0 and noticed the following section has changed:
The helm chart which is updated seems not to reflect this change, or am i wrong?
overrides:
{{- toYaml .Values.global_overrides | nindent 2 }}
{{- if .Values.metricsGenerator.enabled }}
metrics_generator_processors:
{{- range .Values.global_overrides.metrics_generator_processors }}
- {{ . }}
{{- end }}
{{- end }}
My values.yaml needs to be changed? It is currently as below
# Global overrides
global_overrides:
per_tenant_override_config: /runtime-config/overrides.yaml
metrics_generator_processors:
- service-graphs
- span-metrics
Thanks
+1. At this moment the new style is not working using helm value.
As a temporary solution - add manual section:
values: |- overrides: | overrides: "*": ingestion_burst_size_bytes: 50000000 ingestion_rate_limit_bytes: 35000000
What if changing the values yaml this way, would it work?
From:
overrides:
{{- toYaml .Values.global_overrides | nindent 2 }}
{{- if .Values.metricsGenerator.enabled }}
metrics_generator_processors:
{{- range .Values.global_overrides.metrics_generator_processors }}
- {{ . }}
{{- end }}
{{- end }}
...
global_overrides:
per_tenant_override_config: /runtime-config/overrides.yaml
#metrics_generator_processors: []
metrics_generator_processors:
- service-graphs
- span-metrics
To:
overrides:
defaults:
{{- toYaml .Values.global_overrides | nindent 2 }}
{{- if .Values.metricsGenerator.enabled }}
metrics_generator:
{{- range .Values.global_overrides.metrics_generator.processors }}
- {{ . }}
{{- end }}
{{- end }}
...
global_overrides:
per_tenant_override_config: /runtime-config/overrides.yaml
#metrics_generator_processors: []
metrics_generator:
processors:
- service-graphs
- span-metrics
- local-blocks
@edgarkz could you please elaborate a bit more on how to apply your manual workaround to have the helm chart working fine with 2.3.x tempo version and the new override syntax?
Yeah we definitely need this, looks like https://github.com/grafana/helm-charts/pull/2825 is necessary. If you're like me and have some global overrides it may make sense to wait until this patch is released to upgrade to the chart versions that use 2.3. Losing some of those configs suddenly would have really hurt if I hadn't checked the Grafana update notes. Significant stuff like this should also be in the chart release notes, as this directly relates to the chart
@edgarkz could you please elaborate a bit more on how to apply your manual workaround to have the helm chart working fine with 2.3.x tempo version and the new override syntax?
It doesn't work with new syntax.. I have added old syntax overrides to make it work in tempo 2.3
values: |- overrides: | overrides: "*":
TL;DR
On the Helm Chart tempo-distributed instead of adding a parameter overrides, use global_overrides as a workaround:
global_overrides:
defaults:
ingestion:
rate_limit_bytes: 32000000 # 32MB
burst_size_bytes: 48000000 # 48MB
max_traces_per_user: 50000
This is a workaround. The global_overrides isn't the right place but since the template outputs its content to the right final overrides block, you can use it by now.
Why the issue is happening?
The Grafana Tempo overrides documentation is correct! If your final yaml file ends up with a overrides block, it will work.
Although, distributed-tempo Helm Chart is where the issue happens. Since version 1.8.0 of this Helm Chart it is using a new block called overrides that accepts a string. Even if you provide the proper string content it will fail simply because the Helm template isn't adding this block on the final yaml file.
Basically, the helm read the content of that block and add to a variable tempo.OverridesConfig (reference).
Then it creates a ConfigMap that uses the content of this variable as a new file overrides.yaml (reference).
And here comes the issue: when it generates the final yaml (tempo.yaml) it does not add the overrides block (reference).
It lets its content on that external file called overrides.yaml.
Possible solutions
I think in this block the Helm Chart should not only add global_overrides but also overrides block as its content.
This would output the overrides content into the final yaml file. Thus, solving the issue because Tempo binary will read the expected overrides block properly.
At the moment I am unable to enable service graph, any tips on how we can do this?
How can I activate enable_virtual_node_label?
The global_overrides doesn't seem to work anymore (chart version 1.15.2). And tempo.structuredConfig can't be used either because of the invalid legacy format used for the multitenant config, even though I'm not using multitenancy. Couldn't find any way to make the metrics generator work with service graph.
Below is part of my value file which makes the metrics generator work
# Global overrides
global_overrides:
per_tenant_override_config: /runtime-config/overrides.yaml
defaults:
metrics_generator:
processors: [service-graphs, span-metrics]
# Per tenants overrides
overrides: {}
chart version is 1.16.2
@ThomasVitale @batazor
Another confused user here. After spending almost an hour reading everywhere, I think the intended usage was:
overrides:
'*':
metrics_generator:
processors: ['service-graphs', 'span-metrics']
Helm chart version: grafana/tempo-distributed: 1.21.0
But as soon as config is applied, I'm getting rate limited errors on the Alloy.
2024-11-07T03:09:59.590772917Z stderr F ts=2024-11-07T03:09:59.590614967Z level=error msg="Exporting failed. Dropping data." component_path=/ component_id=otelcol.exporter.otlp.tempo error="not retryable error: Permanent error: rpc error: code = ResourceExhausted desc = RATE_LIMITED: ingestion rate limit (local: 0 bytes, global: 0 bytes) exceeded while adding 13541 bytes for user single-tenant" dropped_items=10
It seems that override config doesn't merge with the global config, so you have to reconfigure everything again which is crazy and error prone. I don't want to go down this rabbit hole and figure out which configs should be restored in the overrides:.
If my analysis are correct, you should not use overrides: config in the helm chart unless you know what you're doing.
It's not doing overrides as it suggests, it's doing replacements. Opposite of how values.yaml works in helm chart. If I misunderstood the config, please correct me.
Fortunately this config worked for me. Service Graph is populated, Alloy no longer getting rate limited errors.
global_overrides:
defaults:
metrics_generator:
processors: ['service-graphs', 'span-metrics']
@grafana can we improve this situation? overrides: config seems useless as it stands.
Related issues:
- https://github.com/grafana/tempo/issues/3855
- https://github.com/grafana/tempo/issues/3795
- https://github.com/grafana/helm-charts/issues/3171
- https://github.com/grafana/helm-charts/issues/3134
- https://github.com/grafana/tempo/issues/3820
Hi there! I noticed that we had a bunch of issues open for the overrides settings for the tempo-distributed Helm chart. We have two updates that add more doc for overrides:
- https://github.com/grafana/helm-charts/pull/3468
- https://github.com/grafana/tempo/pull/4415
Please let me know if this addresses the issue.
I am struggling to understand how to use the metricsGenerator with tempo (not distributed) helm chart. From the docs and all the explanations it looks like I would need to have values like:
tempo:
metricsGenerator:
enabled: true
global_overrides:
metrics_generator_processors:
- local-blocks
The issue is with https://github.com/grafana/helm-charts/blob/dfeecb93fff3057d8690c6c13cd2cbe62d08c55d/charts/tempo/values.yaml#L171-L174
I do not see any way extending metrics_generator_processors as it is being created on condition {{- if .Values.tempo.metricsGenerator.enabled }}
and the result of the config map has:
overrides:
metrics_generator_processors:
- local-blocks
per_tenant_override_config: /conf/overrides.yaml
metrics_generator_processors:
- 'service-graphs'
- 'span-metrics'
Which breaks tempo.
Same question as above, but for tempo-distributed. How is it possible to enable "local-blocks" and set its parameter (e.g. filter_server_spans: false)? Also, disable other processors. Thank you.
upd: this works for me. Assuming, you have in your Chart.yaml:
- alias: tempo
condition: tempo.enabled
name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: ^1.9.2
So, values.yaml (noticed, i bumped the version of Tempo, switch on grpc, increased ingester replicas):
tempo:
enabled: true
tempo:
image:
tag: 2.6.1
ingester:
replicas: 2
traces:
otlp:
grpc:
enabled: true
metricsGenerator:
enabled: true
config:
storage:
remote_write:
- url: http://lgtm-mimir-nginx/api/v1/push # URL of locally running Mimir instance.
send_exemplars: true # Send exemplars along with their metrics.
processor:
local_blocks:
filter_server_spans: false
flush_to_storage: true
global_overrides:
metrics_generator_processors:
- local-blocks
for tempo-distributed i use following:
global_overrides:
metrics_generator_processors:
- service-graphs
- local-blocks
which gets expanded to following in configmap:
overrides:
metrics_generator_processors:
- service-graphs
- local-blocks
per_tenant_override_config: /runtime-config/overrides.yaml
this works for me for servicemaps, but i still get deprecation warning
At the moment I am unable to enable service graph, any tips on how we can do this?
How can I activate
enable_virtual_node_label?![]()
Can anyone help?
Apparently global_overrides => overrides, the readme in main looks wrong as it still mentions global_overrides. When I set overrides in the latest chart 1.33.0 and attempt to upgrade I just get incessant crashlooping and errors around the overrides.LegacyOverrides. Not sure what the deal is but now nothing seems to be working correctly. Dropping back to previous chart.
overrides:
defaults:
ingestion:
rate_strategy: local
burst_size_bytes: 200000000
rate_limit_bytes: 350000000
max_bytes_per_tag_values_query: 10000000
max_bytes_per_trace: 200000000
max_traces_per_user: 3000000
metrics_generator:
forwarder_queue_size: 100000
forwarder_workers: 5
processors:
- service-graphs
- span-metrics
- local-blocks
level=warn ts=2025-04-03T19:50:42.299352049Z caller=main.go:133 msg="-- CONFIGURATION WARNINGS --"
level=warn ts=2025-04-03T19:50:42.299396478Z caller=main.go:139 msg="c.StorageConfig.Trace.Block.Version != \"v2\" but v2_in_buffer_bytes is set" explain="This setting is only used in v2 blocks"
level=warn ts=2025-04-03T19:50:42.299404618Z caller=main.go:139 msg="c.StorageConfig.Trace.Block.Version != \"v2\" but v2_out_buffer_bytes is set" explain="This setting is only used in v2 blocks"
level=warn ts=2025-04-03T19:50:42.299409858Z caller=main.go:139 msg="c.StorageConfig.Trace.Block.Version != \"v2\" but v2_prefetch_traces_count is set" explain="This setting is only used in v2 blocks"
level=error ts=2025-04-03T19:50:42.394921997Z caller=app.go:223 msg="module failed" module=overrides err="starting module overrides: invalid service state: Failed, expected: Running, failure: failed to start subservices: not healthy, 0 terminated, 1 failed: [failed to load runtime config: load file: yaml: unmarshal errors:\n line 3: field ingestion not found in type overrides.LegacyOverrides\n line 10: field metrics_generator not found in type overrides.LegacyOverrides]"
level=error ts=2025-04-03T19:50:42.395067187Z caller=app.go:223 msg="module failed" module=metrics-generator err="failed to start metrics-generator, because it depends on module overrides, which has failed: invalid service state: Failed, expected: Running, failure: starting module overrides: invalid service state: Failed, expected: Running, failure: failed to start subservices: not healthy, 0 terminated, 1 failed: [failed to load runtime config: load file: yaml: unmarshal errors:\n line 3: field ingestion not found in type overrides.LegacyOverrides\n line 10: field metrics_generator not found in type overrides.LegacyOverrides]"
level=warn ts=2025-04-03T19:50:42.395236867Z caller=module_service.go:118 msg="module failed with error" module=usage-report err="context canceled"
level=error ts=2025-04-03T19:50:42.395362337Z caller=memberlist_client.go:731 msg="failed to resolve members" addrs=dns+tempo-gossip-ring:7946 err="lookup IP addresses \"tempo-gossip-ring\": lookup tempo-gossip-ring: operation was canceled"```
What about dimensions as well? getting same legacy error as above I had this config on helm chart:
overrides:
default:
metrics_generator:
processor:
span_metrics:
dimensions:
- http.method
- http.target
- http.status_code
- service.version
service_graphs:
dimensions:
- http.method
- http.target
- http.status_code
- service.version
but not sure where to put them now
Does this doc help? https://grafana.com/docs/helm-charts/tempo-distributed/next/get-started-helm-charts/#optional-use-global-or-per-tenant-overrides (update to the docs was from this PR, which document changes from this PR)
We should make sure and update the readme.
Does this doc help? https://grafana.com/docs/helm-charts/tempo-distributed/next/get-started-helm-charts/#optional-use-global-or-per-tenant-overrides (update to the docs was from this PR, which document changes from this PR)
We should make sure and update the readme.
it does not help, the issue being reported by @xakaitetoia (which i am experiencing too) seems to be that any overrides.defaults.metrics_generator.processor entry in the values causes the resulting pod to spin with a legacyConfig error:
│ failed parsing config: failed to parse configFile /conf/tempo.yaml: yaml: unmarshal errors: │
│ line 47: field defaults not found in type overrides.legacyConfig
resource "helm_release" "tempo" {
name = "tempo"
namespace = local.monitor_ns
repository = "https://grafana.github.io/helm-charts"
chart = "tempo"
version = local.monitor_tempo_version # "1.23.2"
values = [
yamlencode({
persistence = {
enabled = true
size = "20Gi"
}
tempo = {
# 14d
retention = "336h"
metricsGenerator = {
enabled = true
remoteWriteUrl = "http://kube-prometheus-stack-prometheus:${local.monitor_prometheus_port}/api/v1/write"
}
queryFrontend = {
metrics = {
max_duration = "168h"
concurrent_jobs = 32
target_bytes_per_job = 1250000000 # ~1.25GB
}
}
overrides = {
defaults = {
metrics_generator = {
processors = ["service-graphs", "span-metrics", "local-blocks"]
# adding this causes the error, without it is fine
# processor = {
# local_blocks = {
# filter_server_spans = false
# flush_to_storage = true
# }
# }
}
}
}
}
})
]
}
relevant generated configmap:
│ overrides: │
│ defaults: │
│ metrics_generator: │
│ processor: │
│ local_blocks: │
│ filter_server_spans: false │
│ flush_to_storage: true │
│ processors: │
│ - service-graphs │
│ - span-metrics │
│ - local-blocks │
│ per_tenant_override_config: /conf/overrides.yaml │
│ metrics_generator: │
│ storage: │
│ path: "/tmp/tempo" │
│ remote_write: │
│ - url: http://kube-prometheus-stack-prometheus:9090/api/v1/write │
│ traces_storage: │
│ path: "/tmp/traces"
│ path: "/tmp/traces"
I gave up on upgrading Tempo, last attempt to upgrade left all pods in a state where they never start, never got beyond the config changes, had to restore the entire namespace from backup.
I wasted a couple of hours on trying to configure options for the local-blocks processor for the monolithic helm chart. I think it's just in a broken state right now, because, as mentioned in #3640, it is not possible to configure processor options such as
local_blocks:
filter_server_spans: false
flush_to_storage: true
in the tempo.metricsGenerator section of the values.yaml
And looking at the docs for the overrides section (https://grafana.com/docs/tempo/next/configuration/#standard-overrides), these are the only accpeted values (and don't appear to cause crashes):
local-blocks:
[max_live_traces: <int>]
[max_block_duration: <duration>]
[max_block_bytes: <int>]
[flush_check_period: <duration>]
[trace_idle_period: <duration>]
[complete_block_timeout: <duration>]
[concurrent_blocks: <duration>]
[filter_server_spans: <bool>]
So if my understanding of this mess is correct, there is no place to configure e.g. the following without errors
local_blocks:
block: <Block config>
search: <Search config>
[filter_server_spans: <bool> | default = true]
[flush_to_storage: <bool> | default = false]
[time_overlap_cutoff: <float64> | default = 0.2]
In terms of tempo-distributed latest helm chart version (1.48.0 at this moment) - the global_overrides or overrides does not work in any way : you'll get error on containers start or config will just be ignored and not applied.
But good news that per_tenant_overrides are working. In order to apply per_tenant_overrides globally just use wildcard:
per_tenant_overrides:
"*":
ingester:
max_block_bytes: 524288000