tempo
tempo copied to clipboard
Configuration for querier.frontend_worker.grpc_client_config.max_recv_msg_size doesn't seem to work?
Describe the bug
When trying to query very large traces in grafana it gives the follow error.
Error message in grafana
failed to get trace with id: 19d5517f5f92a4c58286b227773b81cc
Status: 500 Internal Server Error Body: error querying ingesters in Querier.FindTraceByID:
rpc error:
code = ResourceExhausted
desc = grpc: received message after decompression larger than max (104857601 vs. 104857600)
This is in spite of the following being in my configuration file for tempo
tempo config yaml
server:
http_listen_port: 3200
grpc_server_max_recv_msg_size: 200_000_000
grpc_server_max_send_msg_size: 200_000_000
querier:
frontend_worker:
grpc_client_config:
max_recv_msg_size: 200_000_000
max_send_msg_size: 200_000_000
Is there another component that also needs to updated with this new value?
To Reproduce
Steps to reproduce the behavior:
- Run tempo docker container with the configuration above and generate a very large trace.
docker.io/grafana/tempo latest b11ca097701b 5 hours ago 81.6 MB
- Query the trace in grafana
Expected behavior
Environment:
- Infrastructure: bare-metal
- Deployment tool: docker-compose
If the query on grafana fails with this size limit I would expect it to use the limit configured. Instead this limit looks exactly like the default limit of 100 << 20
that is being set in

Came here to post a similar issue. Looks like the recv & send sizes are hard-coded and can't be set from configuration options, if I'm reading this right.
https://github.com/grafana/tempo/blob/main/modules/querier/config.go#L46-L47
Using the provided config file in the original post I am running:
go run ./cmd/tempo --storage.trace.backend=local --storage.trace.wal.path=/tmp/tempo/wal --storage.trace.local.path=/tmp/tempo/traces --config.file=test.yaml
I can then curl http://localhost:3200/status/config
and see that the values have been set correctly:
GET /status/config
---
target: all
metrics_generator_enabled: false
http_api_prefix: ""
server:
grpc_server_max_recv_msg_size: 200000000
grpc_server_max_send_msg_size: 200000000
...
querier:
frontend_worker:
grpc_client_config:
max_recv_msg_size: 200000000
max_send_msg_size: 200000000
So the values seem to be loaded correctly from the config file. I also put some debug print lines into where we instantiate the frontend worker and it seemed to correctly pass this config to GRPC when it instantiated the connection.
When you pass the config options shown can you confirm that /status/config
is showing the expected values?
I can confirm that /status/config shows the expected values (which I had in my configuration file), while I still get the error shown in the description.
Working through the errors, I adjusted (in this order):
- ingester_client > grpc_client_config > max_recv_msg_size to 200000000
- querier > frontend_worker > grpc_client_config > max_send_msg_size to 200000000
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.
I've hit this same issue, please change to make these fields configurable
aah, I think the issue is caused by a different config :
query_frontend:
search:
target_bytes_per_job: 104857600
increasing this fixed it for me
aah, I think the issue is caused by a different config :
query_frontend: search: target_bytes_per_job: 104857600
increasing this fixed it for me
Sorry, this is incorrect, I was loading a smaller trace by accident when testing it.
Is this still an issue for folks, or do we have a workaround?
This is still an issue no matter which value I set for the max receiver - it always receive 1 byte more and generates a log like:
{"caller":"rate_limited_logger.go:27","err":"rpc error: code = ResourceExhausted desc = grpc: received message after decompression larger than max (20000002 vs. 20000001)","level":"error","msg":"pusher failed to consume trace data","ts":"2024-03-19T10:31:14.658927083Z"}
For context, the above error comes from distributor
Are you in a position to test the change in https://github.com/grafana/tempo/pull/3208 ? I can build an image if that's helpful. If so, you will want to read the changelog from your current version up to the image version.