tempo icon indicating copy to clipboard operation
tempo copied to clipboard

Feature request: Download all trace IDs and associated span IDs

Open simonhf opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Have been using the Jaeger all-in-one docker container to collect traces until recently. And have been using the associated API to download all trace IDs and associated span IDs using this HTTP request:

$ wget -q -O - "http://localhost:16686/api/traces?service=<MY_SERVICE>&lookback=120m&prettyPrint=true&limit=999999"

Why download all the trace IDs and associated span IDs? To feed into my own scripts for further analysis.

However, have been looking at these docs [1] and it seems like there is no equivalent for tempo, or?

Closest I have come is a two step process: 1. Enable and use the tempo search API to search for all trace IDs, and then 2. Issue a trace API request for each and every trace ID to receive its associated span IDs.

If I have e.g. 10,000 trace IDs then I'd prefer not to issue 10,000 HTTP requests...

Is there a better way to download all trace IDs and associated span IDs en masse?

[1] https://grafana.com/docs/tempo/latest/api_docs/#:~:text=Tempo's%20Search%20API%20finds%20traces,in%20a%20monolithic%20mode%20deployment.

Describe the solution you'd like

Something similar to the Jaeger all-in-one API request where all trace IDs and associated span IDs for a particular service can be download en masse?

Describe alternatives you've considered

Have considered building e.g. a front door for tempo so that traces can effectively be teed into tempo and a compressed file for later processing... then I wouldn't need to download all the info from tempo after the fact...

Additional context

n/a

simonhf avatar Jun 14 '22 01:06 simonhf

As you've found the Tempo search API does not return complete traces and instead returns some basic metadata about the found traces.

A few things come to mind with your use case:

  1. Searching by trace id is quite performant depending on the amount of data in your backend. You can also use time ranges to make it even more so: https://github.com/grafana/tempo/pull/1388. Perhaps downloading 10k traces can be done quickly enough?
  2. Scanning the blocks directly is not particularly difficult. You could run batch jobs against your blocks that look for traces that meet a criteria and then ship them wherever you'd like: https://github.com/grafana/tempo/blob/main/cmd/tempo-cli/cmd-search.go
  3. We have discussed in the past (but not implemented) a batch "trace by id" endpoint. This endpoint would allow multiple trace ids to be passed and find them all simultaneously. This could be quite a bit more efficient then finding them individually.
  4. Perhaps a new endpoint could both search and return the entire trace? We are right now in the middle of backend format migration which would complicate adding functionality like this. We would likely delay a change like this until our formats settle on the backend.

WDYT?

joe-elliott avatar Jun 14 '22 14:06 joe-elliott

Thanks for the quick response.

I had a quick look at option 1 and experimented with downloading a trace or two. However, I'm not convinced I was getting the same and complete data downloaded that I was getting with Jaeger all-in-one, and rather than fiddle around further to find out way -- and in the end only find out that it's all not fast enough to download all traces and potentially still fail -- I decided just to revert to Jaeger all-in-one for the time being.

When the backend format migration is done, and if you get around to implementing option 4, then I volunteer to test this out for you :-)

Option 3 also sounds good. And I volunteer to test that too. But it doesn't sound quite as good as option 4 :-) Sending e.g. 1 million trace IDs for option 3 sounds doable... but not necessary with option 4 :-)

simonhf avatar Jun 17 '22 03:06 simonhf

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

github-actions[bot] avatar Nov 14 '22 00:11 github-actions[bot]