tempo
tempo copied to clipboard
Error in ingester: cannot grow buffer beyond 2 gigabytes
Describe the bug Tempo ingesters are failing with the below error:
panic: cannot grow buffer beyond 2 gigabytes
goroutine 326 [running]:
github.com/google/flatbuffers/go.(*Builder).growByteBuffer(...)
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:214
github.com/google/flatbuffers/go.(*Builder).Prep(0x41066185b0, 0x4, 0x36b45)
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:268 +0x2c0
github.com/google/flatbuffers/go.(*Builder).CreateString(0x41066185b0, {0x415fa04000, 0x36b44})
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:337 +0x4c
github.com/google/flatbuffers/go.(*Builder).CreateSharedString(0x41066185b0, {0x415fa04000, 0x36b44})
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:327 +0xbc
github.com/grafana/tempo/pkg/tempofb.writeToBuilder(0x41066185b0, {0x406cbbaa80, 0x37, 0x37}, 0x4041845888)
/drone/src/pkg/tempofb/searchdatamap.go:159 +0x440
github.com/grafana/tempo/pkg/tempofb.SearchDataMapLarge.WriteToBuilder(0x404b0524e0, 0x41066185b0)
/drone/src/pkg/tempofb/searchdatamap.go:133 +0x194
github.com/grafana/tempo/pkg/tempofb.(*SearchBlockHeaderMutable).ToBytes(0x4041845ad8)
/drone/src/pkg/tempofb/SearchBlockHeader_util.go:66 +0xfc
github.com/grafana/tempo/tempodb/search.NewBackendSearchBlock(0x4000928150, {0x227dfc8, 0x400011d5f0}, {0x97, 0x23, 0x35, 0xaf, 0x2a, 0x2f, 0x43, ...}, ...)
/drone/src/tempodb/search/backend_search_block.go:113 +0x5dc
github.com/grafana/tempo/tempodb.(*readerWriter).CompleteSearchBlockWithBackend(0x4040ca09c0, 0x4000928150, {0x97, 0x23, 0x35, 0xaf, 0x2a, 0x2f, 0x43, 0x4b, ...}, ...)
/drone/src/tempodb/tempodb.go:266 +0x74
github.com/grafana/tempo/modules/ingester.(*instance).CompleteBlock(0x4000663440, {0x97, 0x23, 0x35, 0xaf, 0x2a, 0x2f, 0x43, 0x4b, 0xa9, ...})
/drone/src/modules/ingester/instance.go:286 +0x328
github.com/grafana/tempo/modules/ingester.(*Ingester).handleComplete(0x400031ec00, 0x4000797540)
/drone/src/modules/ingester/flush.go:252 +0x290
github.com/grafana/tempo/modules/ingester.(*Ingester).flushLoop(0x400031ec00, 0x3)
/drone/src/modules/ingester/flush.go:205 +0xb8
created by github.com/grafana/tempo/modules/ingester.New
/drone/src/modules/ingester/ingester.go:80 +0x1c8
To Reproduce Steps to reproduce the behavior:
- This is happenning after few days of using tempo 1.2
- I am able to overcome it only by removing the PVC and restarting the ingester pods.
Expected behavior Ingesters should be running fine.
Environment:
- Infrastructure: Kubernetes
- Deployment tool: helm, chart version: 0.13.0
- Version: 1.2.0
Additional Context Here are the values I am using:
config: |
search_enabled: true
multitenancy_enabled: true
compactor:
compaction:
block_retention: 168h
ring:
kvstore:
store: memberlist
querier:
frontend_worker:
frontend_address: {{ include "tempo.queryFrontendFullname" . }}-discovery:9095
grpc_client_config:
max_recv_msg_size: 1.34217728e+08
max_send_msg_size: 1.34217728e+08
ingester:
lifecycler:
ring:
replication_factor: 1
kvstore:
store: memberlist
tokens_file_path: /var/tempo/tokens.json
complete_block_timeout: 10m
memberlist:
abort_if_cluster_join_fails: true
join_members:
- {{ include "tempo.fullname" . }}-gossip-ring
overrides:
max_traces_per_user: 0
max_bytes_per_trace: 3000000
ingestion_rate_strategy: global
ingestion_burst_size_bytes: 40000000
ingestion_rate_limit_bytes: 30000000
server:
grpc_server_max_recv_msg_size: 1.34217728e+08
grpc_server_max_send_msg_size: 1.34217728e+08
http_listen_port: 3100
storage:
trace:
backend: {{.Values.storage.trace.backend}}
s3:
{{- toYaml .Values.storage.trace.s3 | nindent 6}}
blocklist_poll: 5m
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
cache: memcached
memcached:
consistent_hash: true
host: {{ include "tempo.fullname" . }}-memcached
service: memcached-client
timeout: 500ms
Hi thanks for reporting this. It seems that this error is occurring while writing the search-header
file to the ingester disk. 2GB is much larger than expected for that, those files are typically quite small (20MB or less). We will try to protect against this panic in a future release, but for now let's investigate what is different about your traffic and check some relevant config options.
-
Can you provide a full file listing of
/var/tempo/wal
? We particularly want to see how large the blocks being flushed by the ingester are, and how large thesearch
andsearch-header
files are. -
What is your traffic volume, in spans/s and byte/s? These are reported by the metrics
tempo_distributor_spans_received_total
andtempo_distributor_bytes_received_total
. -
Are there any quite large span attributes? For example if the trace includes a copy of the HTTP request and response bodies, that could inflate the size of the search file. We can exclude these tags from search by setting the distributor:search_tags_deny_list. Additionally there might be some other tags that are not useful to your workflow, and excluding these would also cut down on the file size. For example here are the tags we exclude ourselves:
search_tags_deny_list:
- opencensus.exporterversion
- sampler.type
- sampler.param
- client-uuid
- component
- ip
The issue happened again
when it reached 1.4 B spans, its going for a toss. We are running with 5 ingesters and 5 distributors. I dont see much CPU and memory usage.
Hi @mdisibio , we are writing large volumes of data, running load tests with thousands of requests per minute continously. Can you suggest any appropriate configuration to flush the data frequently?
Hi. Could you repost those graphs with rate
's like sum(rate(tempo_distributor_spans_received_total[1m]))
?
These are the flush-related settings that we use and are known to work well up to 2M spans/s.
ingester:
flush_check_period: 5s
trace_idle_period: 5s
max_block_duration: 15m0s
max_block_bytes: 734003200
Of these max_block_duration
is the most relevant and it defaults to 1 hour. If 15m isn't enough, try 5m (our ingesters actually flush blocks even faster than that, about every 2 minutes). Just check that your compactors are able to keep up with the higher number of blocks being generated.
hi @mdisibio
spans-rate
we just have applied your suggested configuration, will report back on this.
hi @mdisibio ,
We have observed a new issue after recent changes.
Issue: Status: 404 Not Found Body: trace not found
we cannot see the trace
after few minutes.
- So we have restarted the ingesters
After restarting the ingresters, we were able to see the missing traces
Current configuration
overrides.yaml: |
overrides: {}
tempo-query.yaml: |
backend: 127.0.0.1:3100
tempo.yaml: |
search_enabled: true
multitenancy_enabled: true
compactor:
compaction:
block_retention: 336h
compacted_block_retention: 30m
compaction_window: 30m
ring:
kvstore:
store: memberlist
distributor:
ring:
kvstore:
store: memberlist
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:55681
grpc:
endpoint: 0.0.0.0:4317
querier:
frontend_worker:
frontend_address: tempo-query-frontend-discovery:9095
grpc_client_config:
max_recv_msg_size: 1.34217728e+08
max_send_msg_size: 1.34217728e+08
ingester:
lifecycler:
ring:
replication_factor: 1
kvstore:
store: memberlist
tokens_file_path: /var/tempo/tokens.json
complete_block_timeout: 10m
flush_check_period: 5s
trace_idle_period: 5s
max_block_duration: 15m0s
max_block_bytes: 734003200
memberlist:
abort_if_cluster_join_fails: true
join_members:
- tempo-gossip-ring
overrides:
max_traces_per_user: 0
max_bytes_per_trace: 2000000
ingestion_rate_strategy: local
ingestion_burst_size_bytes: 40000000
ingestion_rate_limit_bytes: 30000000
server:
grpc_server_max_recv_msg_size: 1.34217728e+08
grpc_server_max_send_msg_size: 1.34217728e+08
http_listen_port: 3100
storage:
trace:
backend: s3
s3:
bucket: xxxxxxxxxxxxxxxxxxxxxxxx
endpoint: xxxxxxxxxxxxxxxxxxxxxxxxxx
pool:
max_workers: 50
queue_depth: 20000
blocklist_poll: 5m
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
cache: memcached
memcached:
consistent_hash: true
host: tempo-memcached
service: memcached-client
timeout: 500ms
When a trace disappears/reappears, it is usually explained by a mismatch between component timings, where a querier cannot see a block, for example. To start, I would adjust this setting. The default is 3x blocklist_poll
which gives more buffer for the tenant index to be built and for queriers to pick it up. This addresses the scenario where an ingester pushes a block to the backend and deletes its local copy before queriers see the new block.
ingester:
complete_block_timeout: 15m
There are a few more ways to dig deeper if it is still occurring.
(1) Search the querier logs for messages like searching for trace in block
when getting a trace not found. It will list all of the blocks checked. If it is sometimes working and sometimes not then we can see which block(s) contain the trace and whether they are being searched each time.
(2) Capture tracing data for tempo itself, and lookup the failed request that was logged in (1). The log message will contain another trace ID which is tempo's own trace. We can inspect the querier and ingester work from there.
Hi, this issue seems to have moved on from the original reported flatbuffer error. If you can provide the requested information, we can keep troubleshooting, else I think it makes sense to close this issue and open new ones for other issues. Or we can always do general help in the forums or slack channel.
- Can you provide a full file listing of /var/tempo/wal? We particularly want to see how large the blocks being flushed by the ingester are, and how large the search and search-header files are.
- Are there any quite large span attributes? For example if the trace includes a copy of the HTTP request and response bodies, that could inflate the size of the search file. We can exclude these tags from search by setting the distributor:search_tags_deny_list.
We had this happen to us just now, so here's the /var/tempo/wal (ls -ltR)
$ kubectl -n tempo exec -it ingester-0 -- ls -ltR /var/tempo/wal
/var/tempo/wal:
total 1062500
-rw-r--r-- 1 root root 12271199 May 6 07:06 20368c0c-67d6-4001-a8db-80378a6cb4b3:single-tenant:v2:snappy:v2
drwxr-xr-x 2 root root 4096 May 6 07:06 search
-rw-r--r-- 1 root root 1075715540 May 6 06:44 e2ce9212-aae4-4eb4-8a53-a045423d97b1:single-tenant:v2:snappy:v2
drwxr-xr-x 3 root root 4096 May 6 04:27 blocks
/var/tempo/wal/search:
total 3451792
-rw-r--r-- 1 root root 29068476 May 6 07:06 20368c0c-67d6-4001-a8db-80378a6cb4b3:single-tenant:v2:none:
-rw-r--r-- 1 root root 3505560048 May 6 06:44 e2ce9212-aae4-4eb4-8a53-a045423d97b1:single-tenant:v2:none:
/var/tempo/wal/blocks:
total 4
drwxr-xr-x 7 root root 4096 May 6 07:06 single-tenant
/var/tempo/wal/blocks/single-tenant:
total 20
drwxr-xr-x 2 root root 4096 May 6 07:06 8a76fb3e-bd5e-4a2f-97fb-c77e4c245f91
drwxr-xr-x 2 root root 4096 May 6 06:59 fa9cdc9f-59a3-4929-b83c-075fe08a48ee
drwxr-xr-x 2 root root 4096 May 6 06:55 06ab2165-1b64-45a7-947e-ea9d9e673e78
drwxr-xr-x 2 root root 4096 May 6 06:51 6b1feb85-da86-44a8-90ed-e5a2e4528535
drwxr-xr-x 2 root root 4096 May 6 06:50 a1512bb6-a54a-4459-ba01-c34ea7ff95b3
/var/tempo/wal/blocks/single-tenant/8a76fb3e-bd5e-4a2f-97fb-c77e4c245f91:
total 7092
-rw-r--r-- 1 root root 102424 May 6 07:06 bloom-0
-rw-r--r-- 1 root root 6887984 May 6 07:06 data
-rw-r--r-- 1 root root 30 May 6 07:06 flushed
-rw-r--r-- 1 root root 256000 May 6 07:06 index
-rw-r--r-- 1 root root 381 May 6 07:06 meta.json
/var/tempo/wal/blocks/single-tenant/fa9cdc9f-59a3-4929-b83c-075fe08a48ee:
total 8752
-rw-r--r-- 1 root root 102424 May 6 06:59 bloom-0
-rw-r--r-- 1 root root 8585334 May 6 06:59 data
-rw-r--r-- 1 root root 30 May 6 06:59 flushed
-rw-r--r-- 1 root root 256000 May 6 06:59 index
-rw-r--r-- 1 root root 381 May 6 06:59 meta.json
/var/tempo/wal/blocks/single-tenant/06ab2165-1b64-45a7-947e-ea9d9e673e78:
total 7576
-rw-r--r-- 1 root root 30 May 6 06:55 flushed
-rw-r--r-- 1 root root 102424 May 6 06:54 bloom-0
-rw-r--r-- 1 root root 7382349 May 6 06:54 data
-rw-r--r-- 1 root root 256000 May 6 06:54 index
-rw-r--r-- 1 root root 381 May 6 06:54 meta.json
/var/tempo/wal/blocks/single-tenant/6b1feb85-da86-44a8-90ed-e5a2e4528535:
total 5596
-rw-r--r-- 1 root root 102424 May 6 06:51 bloom-0
-rw-r--r-- 1 root root 5354751 May 6 06:51 data
-rw-r--r-- 1 root root 30 May 6 06:51 flushed
-rw-r--r-- 1 root root 256000 May 6 06:51 index
-rw-r--r-- 1 root root 380 May 6 06:51 meta.json
/var/tempo/wal/blocks/single-tenant/a1512bb6-a54a-4459-ba01-c34ea7ff95b3:
total 6860
-rw-r--r-- 1 root root 102424 May 6 06:50 bloom-0
-rw-r--r-- 1 root root 6650228 May 6 06:50 data
-rw-r--r-- 1 root root 30 May 6 06:50 flushed
-rw-r--r-- 1 root root 256000 May 6 06:50 index
-rw-r--r-- 1 root root 381 May 6 06:50 meta.json
The PV is
pvc-b816510b-0670-49cd-980b-71bdb2f5183b 100Gi RWO Delete Bound tempo/ingester-data-ingester-0 standard-block-storage 4h11m
and has
$ kubectl -n tempo exec -it ingester-0 -- du -sm /var/tempo
4680 /var/tempo
So should have plenty of space.
Some logs:
level=info ts=2022-05-06T07:06:10.183916094Z caller=flush.go:244 msg="completing block" userid=single-tenant blockID=8a76fb3e-bd5e-4a2f-97fb-c77
e4c245f91
level=info ts=2022-05-06T07:06:11.089084194Z caller=flush.go:251 msg="block completed" userid=single-tenant blockID=8a76fb3e-bd5e-4a2f-97fb-c77e
4c245f91 duration=905.177258ms
level=info ts=2022-05-06T07:06:11.096297342Z caller=flush.go:300 msg="flushing block" userid=single-tenant block=8a76fb3e-bd5e-4a2f-97fb-c77e4c2
45f91
panic: cannot grow buffer beyond 2 gigabytes
goroutine 289 [running]:
github.com/google/flatbuffers/go.(*Builder).growByteBuffer(...)
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:214
github.com/google/flatbuffers/go.(*Builder).Prep(0xc0e5679710, 0x4, 0x5ee7)
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:268 +0x2b8
github.com/google/flatbuffers/go.(*Builder).CreateString(0xc0e5679710, {0xc0aa9d0000, 0x5ee6})
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:337 +0x4b
github.com/google/flatbuffers/go.(*Builder).CreateSharedString(0xc0e5679710, {0xc0aa9d0000, 0x5ee6})
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:327 +0xa5
EDIT: we have
max_block_bytes: 1073741824,
max_block_duration: '1h',
trace_idle_period: '10s',
and flush_check_period
is unset (so whatever the default is)
Hi thanks for the information! Sorry that this is happening, my thought is it's very traffic-dependent, and not necessarily a bug. We haven't seen this ourselves. If this is causing immediate trouble, the best workaround is disable search and delete the files from /var/tempo/wal/search
.
- Was search recently enabled? I notice that there are no search-related files under
/var/tempo/wal/blocks/single-tenant/
. There should be the presence ofsearch.meta.json
for example. This question will help understand if things are working sometimes or not. When fully working it looks like this:
/var/tempo/wal/blocks/1/dd07acdd-7622-49fe-bd73-1117e2e0a752:
total 670M
-rw-r--r-- 1 root root 97.7K May 6 13:50 bloom-0
-rw-r--r-- 1 root root 97.7K May 6 13:50 bloom-1
-rw-r--r-- 1 root root 97.7K May 6 13:50 bloom-2
-rw-r--r-- 1 root root 590.9M May 6 13:50 data
-rw-r--r-- 1 root root 29 May 6 13:50 flushed
-rw-r--r-- 1 root root 250.0K May 6 13:50 index
-rw-r--r-- 1 root root 375 May 6 13:50 meta.json
-rw-r--r-- 1 root root 59.0M May 6 13:50 search
-rw-r--r-- 1 root root 19.6M May 6 13:50 search-header
-rw-r--r-- 1 root root 100.0K May 6 13:50 search-index
-rw-r--r-- 1 root root 77 May 6 13:50 search.meta.json
-
If you have the complete stack trace that will be helpful, as we can see which file is erroring. There are 3 locations where flatbuffers are used.
-
The large size of the files under
/var/tempo/wal/search
is very surprising compared the size of the matching WAL block. For examplewal/search/e2ce9212-aae4-4eb4-8a53-a045423d97b1:single-tenant:v2:none:
is 3.5GB butwal/e2ce9212-aae4-4eb4-8a53-a045423d97b1:single-tenant:v2:snappy:v2
is only 1.1GB. The search files contain a subset of the data, so is usually smaller (even without compression), not 300% larger.
Internally our ratio looks like this:
# ls -lRh /var/tempo/wal
/var/tempo/wal:
-rw-r--r-- 1 root root 42.4M May 6 14:03 0ffafe61-c321-46e1-9875-4b76f07afda2:1:v2:snappy:v2
-rw-r--r-- 1 root root 709.8M May 6 14:03 f683d609-c666-45e8-bb99-aa122ed618a8:1:v2:snappy:v2
/var/tempo/wal/search:
-rw-r--r-- 1 root root 17.2M May 6 14:03 0ffafe61-c321-46e1-9875-4b76f07afda2:1:v2:none:
-rw-r--r-- 1 root root 275.3M May 6 14:03 f683d609-c666-45e8-bb99-aa122ed618a8:1:v2:none:
~60% smaller.
Please see question (3) in the comment above. Do your traces include any very large attributes like http request/response bodies? If so we can remove them from being recorded by setting search_tags_deny_list
. Removing smaller tags that aren't searched will also help, such as client-uuid
, etc.
The 2GB limit is a hard limit that can't be changed, so the goal is to determine what is using such space and eliminate it.
We've had search enabled for a while now, but this seems to have started happening either because of the upgrade to 1.4.1, or because we switched to tail-based sampling in otel-collector (those two happened around the same time).
I'm not aware of any huge tags, though I did find one java command-line that was quite large and I excluded it. Is there a way to look at the data in the search files to find any large tags and/or verify that my exclusion worked?
The crash just happened again (sadly I deleted the PV's and recreated the ingester statefulset to get things working again, so I can't get a direcory listing; I have a feeling I can get it later, when it crashes again, though):
level=info ts=2022-05-09T04:42:26.399844575Z caller=flush.go:300 msg="flushing block" userid=single-tenant block=d494cad4-2176-408e-984e-9e1ed1d4bc1e
panic: cannot grow buffer beyond 2 gigabytes
goroutine 321 [running]:
github.com/google/flatbuffers/go.(*Builder).growByteBuffer(...)
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:214
github.com/google/flatbuffers/go.(*Builder).Prep(0xc000855910, 0x4, 0x5fe1)
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:268 +0x2b8
github.com/google/flatbuffers/go.(*Builder).CreateString(0xc000855910, {0xc0b71a6000, 0x5fe0})
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:337 +0x4b
github.com/google/flatbuffers/go.(*Builder).CreateSharedString(0xc000855910, {0xc0b71a6000, 0x5fe0})
/drone/src/vendor/github.com/google/flatbuffers/go/builder.go:327 +0xa5
github.com/grafana/tempo/pkg/tempofb.writeKeyValues(0xc000855910, {0xc0b9bffc90, 0x80}, {0xc0f6f2a000, 0xbf00, 0xec00}, {0x22866c0, 0xc0eee4b4f0}, 0x0)
/drone/src/pkg/tempofb/searchdatamap.go:133 +0x318
github.com/grafana/tempo/pkg/tempofb.WriteSearchDataMap(0xe48, 0x1dccc98, 0xc)
/drone/src/pkg/tempofb/searchdatamap.go:89 +0x2b8
github.com/grafana/tempo/pkg/tempofb.(*SearchBlockHeaderMutable).ToBytes(0xc000855b08)
/drone/src/pkg/tempofb/SearchBlockHeader_util.go:66 +0xd3
github.com/grafana/tempo/tempodb/search.NewBackendSearchBlock(0x40, {0x2286948, 0xc040b06120}, {0x15, 0xb0, 0x86, 0x6d, 0xaa, 0xcb, 0x41, ...}, ...)
/drone/src/tempodb/search/backend_search_block.go:115 +0x656
github.com/grafana/tempo/tempodb.(*readerWriter).CompleteSearchBlockWithBackend(0xbb41cbaa6d86b015, 0xd1de026f6f7c8880, {0x15, 0xb0, 0x86, 0x6d, 0xaa, 0xcb, 0x41, 0xbb, ...}, ...)
/drone/src/tempodb/tempodb.go:271 +0x85
github.com/grafana/tempo/modules/ingester.(*instance).CompleteBlock(0xc0007939e0, {0x15, 0xb0, 0x86, 0x6d, 0xaa, 0xcb, 0x41, 0xbb, 0x80, ...})
/drone/src/modules/ingester/instance.go:311 +0x2a4
github.com/grafana/tempo/modules/ingester.(*Ingester).handleComplete(0xc000282c00, 0xc040f86370)
/drone/src/modules/ingester/flush.go:250 +0x25c
github.com/grafana/tempo/modules/ingester.(*Ingester).flushLoop(0xc000282c00, 0x1)
/drone/src/modules/ingester/flush.go:203 +0xc7
created by github.com/grafana/tempo/modules/ingester.New
/drone/src/modules/ingester/ingester.go:89 +0x1e9
BTW: once an ingester gets into this state, it seems to be hosed until I manually "delete the large file" (i.e. kill the PV everything is stored on, since I don't think I should be messing with ingester-internal files with a scalpel). If nothing else, it might be good to fix it so that ingesters can recover without manual intervention (perhaps that's a tall order, though).
Suggestion2: perhaps there should be (or perhaps there is? Let me know!) some setting via which to reject large tags. Or not add to a file that is already too big, so we don't get into this unrecoverable state to begin with.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.