tempo Status: 500 Internal Server Error Body: too many failed block queries 5

trafficstars

Describe the bug I have recently replicated the data from the current S3 bucket to a new S3 bucket. When querying against data in this new S3 bucket, I am seeing error "failed to get trace with id: d0504c815a12942af2fb23fbdd6fdee3 Status: 500 Internal Server Error Body: too many failed block queries 5 (max 0)".

How may I troubleshoot/fix this? Thanks.

To Reproduce Steps to reproduce the behavior:

Replicate data from old to new S3 bucket
Configure Tempo to use the new S3 bucket
Run a query

Apr 13 '22 07:04 chenfeilee

Please check your querier logs for errors on pulling data from the S3 bucket. Perhaps this is a permissions or polling issue?

Apr 13 '22 12:04 joe-elliott

Thanks for the response.

I don't see any error in the querier logs though. Seems like there is no permission issue as I can see log lines like this one from the querier: level=info ts=2022-04-14T01:02:00.01027576Z caller=tempodb.go:348 org_id=single-tenant msg="searching for trace in block" findTraceID=00000000000000006002a1f980d7e12c block=0092d0a0-cb7a-4156-8958-ca9bc5a75685 found=false

There are some query that could go through but most of them failed with too many failed block queries.

Apr 14 '22 01:04 chenfeilee

I tried running tempo-cli to query against the blocks in backend S3 directly. Seems like some of the blocks in the S3 bucket are corrupted. Just wondering how may I single those blocks out? thanks

Apr 14 '22 01:04 chenfeilee

Well, we should be able to see at least some failures in the querier logs that would help us understand why the blocks are failing.

Perhaps the replication copied over blocks in the process of being created and these partial blocks are a problem? I'd really like to find the log messages from the querier so we can feel very confident we understand why the blocks are failing.

Apr 14 '22 13:04 joe-elliott

@chenfeilee Did you ever resolve this? we randomly started seeing this in our cluster today, for old and new data.

Sep 09 '22 22:09 Aaron-ML

"Failed block queries" could be due to a huge number of reasons. That log message is where the query frontend is counting up the failed blocks to see if it passes the tolerate_failed_blocks threshold. Your queriers should have real log messages that indicate why the blocks are not being queried correctly.

Sep 12 '22 12:09 joe-elliott

"Failed block queries" could be due to a huge number of reasons. That log message is where the query frontend is counting up the failed blocks to see if it passes the tolerate_failed_blocks threshold. Your queriers should have real log messages that indicate why the blocks are not being queried correctly.

Thanks for the response, sorry for hijacking this issue. Ended up being Azure (should have assumed)

Sep 12 '22 18:09 Aaron-ML

Not sure if that will help, but from logs I see following:

error finding trace by id, blockID: e9fa6814-9b13-4879-851d-1475a7887b8d: error retrieving bloom (single-tenant, e9fa6814-9b13-4879-851d-1475a7887b8d): reading storage container: Get \"https://whatever.blob.core.windows.net/tempoprod/single-tenant/e9fa6814-9b13-4879-851d-1475a7887b8d/bloom-0?timeout=61\": dial tcp: lookup whatever.blob.core.windows.net on 10.0.0.10:53: dial udp 10.0.0.10:53: operation was canceled

which makes me believe that in our case it is issue with coredns in azure kubernetes rather than tempo itself and even worse it seems like tempo was not even able to connect to it (if I understand error message correctly)

Notes:

error complaining about "too many failed blocks queries" is in frontend querier
error describing what happened is in queriers

Sep 28 '22 06:09 mac2000

For Azure DNS issues please see this thread: https://github.com/grafana/tempo/issues/1462. It's long running but toward the bottom you will see some of the steps we took to resolve our issues.

Sep 28 '22 12:09 joe-elliott

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

Nov 28 '22 00:11 github-actions[bot]

Hi, having same errors. Querier logs says "error retrieving bloom bloom-0 (single-tenant, e6f45a73-99c3-4803-a5eb-52ee3dda2dfa): does not exist" So i guess i have to check on the S3 side. Seems strange anyway as other instrumentation pushes to same S3 bucket and not having this issue

Mar 07 '23 14:03 flenoir

Hello @flenoir , did you manage to solve this error ? I'm having the same behavior

Backend: Azure
Errors on logs : error finding trace by id, blockID: XXX: error retrieving bloom bloom-0 (single-tenant, XXX): does not exist;

Jun 13 '23 13:06 SkanderRedjel

We have since removed this feature:

https://github.com/grafana/tempo/pull/2416

This will make it far easier to pinpoint issues with the trace by id path. This change will be available in the next release of Tempo.

Jun 20 '23 18:06 joe-elliott

Hello @flenoir , did you manage to solve this error ? I'm having the same behavior
* Backend: Azure

* Errors on logs :
  error finding trace by id, blockID: XXX: error retrieving bloom bloom-0 (single-tenant, XXX): does not exist;

Having the same issue..

Jun 21 '23 06:06 man0s

Hi, it seems the deletion of s3 content solved the issue but i have same issue again.

Will check for a solution and will post here what i can find

Sep 11 '23 15:09 flenoir

So, i did try tempo-cli commands and found some errors related to vParquet (tempo-cli: error: unsupported block version: vParquet), when using "list index" on an existing traceID in my s3 bucket. Finally seems to have solved it by using a setting in grafana datasource. In TraceId Query, i did activate "use time range in query" and found it working now.

I hope this can help, those who have same kind of issue.

Sep 12 '23 12:09 flenoir

tempo tempo copied to clipboard

Status: 500 Internal Server Error Body: too many failed block queries 5

tempo
tempo copied to clipboard