pyroscope
pyroscope copied to clipboard
Failed to upload profile with error: EOF of trees cache
Agent can't upload profile.
upload profile: do http request: Post "http://xxx-pyroscope.xxx.com/ingest?aggregationType=&from=1644953850&name=xxx-xx-xx-test-499981905626664960%7B%7D&sampleRate=100&spyName=gospy&units=&until=1644953860": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
There are errors on pyroscope server.
time="2022-02-16T07:12:04.462396" level=error msg="trees cache for pyroscope.server.cpu{}:7:1564403200: EOF" file=" storage/storage_put.go:76"
time="2022-02-16T07:12:04.462490" level=error msg="trees cache for pyroscope.server.cpu{}:5:1644403200: EOF" file=" storage/storage_put.go:76"
time="2022-02-16T07:12:04.551946" level=error msg="trees cache for pyroscope.server.alloc_objects{}:5:1644403200: EOF" file=" storage/storage_put.go:76"
time="2022-02-16T07:12:04.554790" level=error msg="trees cache for pyroscope.server.alloc_space{}:7:1564403200: EOF" file=" storage/storage_put.go:76"
time="2022-02-16T07:12:04.554874" level=error msg="trees cache for pyroscope.server.alloc_space{}:5:1644403200: EOF" file=" storage/storage_put.go:76"
time="2022-02-16T07:12:04.557736" level=error msg="trees cache for pyroscope.server.inuse_objects{}:7:1564403200: EOF" file=" storage/storage_put.go:76"
time="2022-02-16T07:12:04.557839" level=error msg="trees cache for pyroscope.server.inuse_objects{}:5:1644403200: EOF" file=" storage/storage_put.go:76"
time="2022-02-16T07:12:04.647913" level=error msg="trees cache for pyroscope.server.inuse_space{}:5:1644403200: EOF" file=" storage/storage_put.go:76"
Hi @koolay , thanks for reporting we'll look into this!
- Can you let us know what version of Pyroscope you're using?
- Also are you using the push or the pull integration?
- Does this happen all the time or just occasionally?
@Rperry2174
Pyroscope's image is pyroscope/pyroscope:0.8.0, and with push mode.
Looks like the data has been corrupted. Are these messages repeating with the same numbers in the key name (for example, pyroscope.server.cpu{}:5:1644403200) or they are changing? Do you see similar messages for applications other than pyroscope.server?
Could you please export and send us one of the affected chunks that cause the problem?
curl --fail -G -o /tmp/tree-eof --data-urlencode "k=t:pyroscope.server.cpu{}:5:1644403200" http://localhost:4040/debug/storage/export/trees
(It targets localhost, you may need to adjust the URL.)
The data is raw profile bytes, it does not contain any sensitive info like function names.
Also, could you please clarify which Go client you are using:
- github.com/pyroscope-io/client/pyroscope
- github.com/pyroscope-io/pyroscope/pkg/agent/profiler
It'd be interesting to know what the read timeout is, and establish the causality is between the timeout and the ingestion error, i.e.:
- is the timeout caused by the ingestion error that somehow doesn't complete the request?
- is the ingestion error caused by the timeout?
- are they unrelated?
@kolesnikovae The client is github.com/pyroscope-io/client v0.2.0.
@abeaumont I'm not sure that they are related.
Is it related about the size of tree nodes? @kolesnikovae
I'd say it should work fine unless the tree size reaches hundreds of megabytes. The relation between the error message and the HTTP timeout is pretty indirect - the code that causes the error does not depend on the HTTP connection (if the client refuses it, processing won't be interrupted), but, apparently, the server was unable to put the data into the storage on time.
My best guess is that the whole tree got corrupted due to unexpected shutdown or because of a bug. Unfortunately, the EOF error message (no more data can be read) does not allow us unambiguously identify the exact reason, therefore I'm asking you to provide us with the data sample.
Trees (profiles) are stored in the underlying KV database (BadgerDB), where the key looks like pyroscope.server.alloc_space{}:5:1644403200 and the value is the tree itself. The error message states that the tree could not be fetched from the intermediate cache, which in turn means that the tree is found in the DB but it either:
- can't be read from the database. To me it says that the DB layer is affected; which is quite a rare occasion because of BadgerDB MVCC model - data is written in transactions, therefore in case of a failure we'll either end up with a previous tree version, or nothing. I can't say I saw anything similar.
- can't be deserialised. With 99,9% certainty it's a bug.