tempo icon indicating copy to clipboard operation
tempo copied to clipboard

Configuration for querier.frontend_worker.grpc_client_config.max_recv_msg_size doesn't seem to work?

Open mariusvniekerk opened this issue 2 years ago • 14 comments

Describe the bug

When trying to query very large traces in grafana it gives the follow error.

Error message in grafana

failed to get trace with id: 19d5517f5f92a4c58286b227773b81cc 
Status: 500 Internal Server Error Body: error querying ingesters in Querier.FindTraceByID: 
rpc error: 
code = ResourceExhausted 
desc = grpc: received message after decompression larger than max (104857601 vs. 104857600)

This is in spite of the following being in my configuration file for tempo

tempo config yaml


server:
  http_listen_port: 3200
  grpc_server_max_recv_msg_size: 200_000_000
  grpc_server_max_send_msg_size: 200_000_000

querier:
  frontend_worker:
    grpc_client_config:
      max_recv_msg_size: 200_000_000
      max_send_msg_size: 200_000_000

Is there another component that also needs to updated with this new value?

To Reproduce

Steps to reproduce the behavior:

  1. Run tempo docker container with the configuration above and generate a very large trace.
docker.io/grafana/tempo                 latest        b11ca097701b  5 hours ago   81.6 MB
  1. Query the trace in grafana

Expected behavior

Environment:

  • Infrastructure: bare-metal
  • Deployment tool: docker-compose

If the query on grafana fails with this size limit I would expect it to use the limit configured. Instead this limit looks exactly like the default limit of 100 << 20 that is being set in

mariusvniekerk avatar Jul 22 '22 15:07 mariusvniekerk

Screen Shot 2022-07-22 at 12 18 32

mariusvniekerk avatar Jul 22 '22 16:07 mariusvniekerk

Came here to post a similar issue. Looks like the recv & send sizes are hard-coded and can't be set from configuration options, if I'm reading this right.

https://github.com/grafana/tempo/blob/main/modules/querier/config.go#L46-L47

image

s4v4g3 avatar Jul 25 '22 18:07 s4v4g3

Using the provided config file in the original post I am running:

go run ./cmd/tempo --storage.trace.backend=local --storage.trace.wal.path=/tmp/tempo/wal --storage.trace.local.path=/tmp/tempo/traces --config.file=test.yaml

I can then curl http://localhost:3200/status/config and see that the values have been set correctly:

GET /status/config
---
target: all
metrics_generator_enabled: false
http_api_prefix: ""
server:
    grpc_server_max_recv_msg_size: 200000000
    grpc_server_max_send_msg_size: 200000000
...
querier:
  frontend_worker:
    grpc_client_config:
      max_recv_msg_size: 200000000
      max_send_msg_size: 200000000

So the values seem to be loaded correctly from the config file. I also put some debug print lines into where we instantiate the frontend worker and it seemed to correctly pass this config to GRPC when it instantiated the connection.

When you pass the config options shown can you confirm that /status/config is showing the expected values?

joe-elliott avatar Aug 02 '22 16:08 joe-elliott

I can confirm that /status/config shows the expected values (which I had in my configuration file), while I still get the error shown in the description.

Working through the errors, I adjusted (in this order):

  • ingester_client > grpc_client_config > max_recv_msg_size to 200000000
  • querier > frontend_worker > grpc_client_config > max_send_msg_size to 200000000

kago-dk avatar Sep 23 '22 15:09 kago-dk

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

github-actions[bot] avatar Nov 23 '22 00:11 github-actions[bot]

I've hit this same issue, please change to make these fields configurable

cameronbraid avatar Apr 19 '23 01:04 cameronbraid

aah, I think the issue is caused by a different config :

query_frontend:
    search:
        target_bytes_per_job: 104857600

increasing this fixed it for me

cameronbraid avatar Apr 19 '23 01:04 cameronbraid

aah, I think the issue is caused by a different config :

query_frontend:
    search:
        target_bytes_per_job: 104857600

increasing this fixed it for me

Sorry, this is incorrect, I was loading a smaller trace by accident when testing it.

cameronbraid avatar Apr 19 '23 01:04 cameronbraid

Is this still an issue for folks, or do we have a workaround?

zalegrala avatar Jan 30 '24 13:01 zalegrala

This is still an issue no matter which value I set for the max receiver - it always receive 1 byte more and generates a log like: {"caller":"rate_limited_logger.go:27","err":"rpc error: code = ResourceExhausted desc = grpc: received message after decompression larger than max (20000002 vs. 20000001)","level":"error","msg":"pusher failed to consume trace data","ts":"2024-03-19T10:31:14.658927083Z"} For context, the above error comes from distributor

sy-be avatar Mar 19 '24 10:03 sy-be

Are you in a position to test the change in https://github.com/grafana/tempo/pull/3208 ? I can build an image if that's helpful. If so, you will want to read the changelog from your current version up to the image version.

zalegrala avatar Mar 19 '24 20:03 zalegrala