thanos
thanos copied to clipboard
Tracing: Add Jaeger B3 header support
- [x] I added CHANGELOG entry for this change.
- [ ] Change is not relevant to the end user.
Changes
Jaeger tracing uses its own HTTP/GRPC header to perform context propagation between services - "uber-trace-id". However, Jaeger tracing clients also support context propagation via the B3 header format (as popularized by Zipkin tracing).
This commit adds support for configuring the B3 header format within thanos tracing. The main motivation for doing this is interoperability with other systems that may call into or out of Thanos components - for example, Grafana or Envoy - both of which support Jaeger tracing, but use B3 headers instead of the Jaeger-specific "uber-trace-id" format headers.
Unfortunately, we cannot directly toggle this as an option or configure
it via the environment; the jaeger-go library requires us to pass it in
as an option when we call NewTracer. In order to facilitate this, we
extend the ParseConfigFromYaml file to not only return a Jaeger
configuration object, but the actual YAML configuration object as well.
This allows us to check and see if the user requested this kind of
header propagation, and we can then construct the necessary additional
options to NewTracer.
Of course, this is all legacy configuration in many ways. The tracing world is rapidly moving towards OpenTelemetry and W3C headers for context propagation. However, there is still utility in supporting B3 headers until everyone (including Thanos!) finishes their OpenTelemetry migration projects.
Verification
Programmatically testing this change is tricky, because we need to actually inspect the headers of the HTTP and GRPC requests that flow through the system with tracing enabled. I was able to verify that this worked through manual testing via the query frontend:
- I started a local ngrok tunnel, which allows me to intercept requests between the query frontend and a downstream querier (I configured it statically for port 10904, which is one of the querier ports).
- I launched a local OpenTelemetry Collector and Jaeger all-in-one image to receive and display traces.
- I modified
scripts/quickstart.shto:
- Add a query frontend, with the
query-frontend.downstream-urlpointed to the interceptingngroktunnel. - Add appropriate tracing configurations for all components.
Once those modifications were in place and launched with make quickstart,
I was able to write queries in the query frontend, see that the intercepted
API calls to the downstream querier contained tracing headers in the
expected format, and verify that the traces were ingested into the Jaeger
backend via the OpenTelemetry collector (the OpenTelemetry collector was not
strictly necessary, but I prefer to use it while debugging).
Screenshot showing the intercepted API request, with correct B3 headers:

Screenshot of the complete trace ingested to the Jaeger backend, with
correct context-propagation and parent/child span relationships preserved:

cc @fpetkovski, since I think you may be interested in this.
Since we have another WIP pr for jaeger Opentelemetry migration, maybe it is better to have this after https://github.com/thanos-io/thanos/pull/5411 is done?
Is there a reason not to add these by default?
@GiedriusS That depends on exactly what you mean with the question.
- If you mean "make the default
true", then I would say that it's a breaking change for anyone that's relying on the current behavior today (and I don't know how many people that might be). - If you mean "why not always add those headers in addition to the normal
uber-trace-idheader", that's certainly possible. In OpenTelemetry terms, that would be a "composite propagator", and it isn't extraordinarily difficult to implement. However, the OpenTracing standard never defined such a concept and so thejaeger-client-golibrary didn't implement one (as far as I can see, anyways). We'd just need to write that here. That said: I don't know if enough people would want it to justify the effort.
Since we have another WIP pr for jaeger Opentelemetry migration, maybe it is better to have this after https://github.com/thanos-io/thanos/pull/5411 is done?
@yeya24 If that's going to merge soon, then perhaps it is better to wait. My impression was that the PR may not be close to merging given the length of time it has been open, but that impression may have been mistaken! 😄
That said - it looks like #5411 would opentracing "basic" propagation for all supported tracing backends, which may not be what is desired (or intended). There's a new autoprop package for the otel-go SDK which may be an attractive option, since it would allow us to easily extend support for additional propagation types and also allows for the otel-standard OTEL_PROPAGATORS env var support. If included in #5411, that could make this PR obsolete entirely.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Looks like this PR is not ready for review. Does it make sense to focus efforts there @ahayworth ?
Looks like this PR is not ready for review.
It was at the time it was submitted! 😄 But, it would have to be reworked since #5411 merged.
Does it make sense to focus efforts there @ahayworth ?
No, I don't think so (and I won't continue this PR). #5411 allows me to solve the problem of interoperability that I faced in an entirely different way: using OpenTelemetry native tracing and propagation. (Not even just different, I think it's a better solution!).
I'll close this for now, but if anyone else needs this support in the future the PR can serve as a helpful guide for anyone that looks to implement B3 header support for the legacy Jaeger tracing.