thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Tracing: Add Jaeger B3 header support

Open ahayworth opened this issue 3 years ago • 3 comments

  • [x] I added CHANGELOG entry for this change.
  • [ ] Change is not relevant to the end user.

Changes

Jaeger tracing uses its own HTTP/GRPC header to perform context propagation between services - "uber-trace-id". However, Jaeger tracing clients also support context propagation via the B3 header format (as popularized by Zipkin tracing).

This commit adds support for configuring the B3 header format within thanos tracing. The main motivation for doing this is interoperability with other systems that may call into or out of Thanos components - for example, Grafana or Envoy - both of which support Jaeger tracing, but use B3 headers instead of the Jaeger-specific "uber-trace-id" format headers.

Unfortunately, we cannot directly toggle this as an option or configure it via the environment; the jaeger-go library requires us to pass it in as an option when we call NewTracer. In order to facilitate this, we extend the ParseConfigFromYaml file to not only return a Jaeger configuration object, but the actual YAML configuration object as well. This allows us to check and see if the user requested this kind of header propagation, and we can then construct the necessary additional options to NewTracer.

Of course, this is all legacy configuration in many ways. The tracing world is rapidly moving towards OpenTelemetry and W3C headers for context propagation. However, there is still utility in supporting B3 headers until everyone (including Thanos!) finishes their OpenTelemetry migration projects.

Verification

Programmatically testing this change is tricky, because we need to actually inspect the headers of the HTTP and GRPC requests that flow through the system with tracing enabled. I was able to verify that this worked through manual testing via the query frontend:

  1. I started a local ngrok tunnel, which allows me to intercept requests between the query frontend and a downstream querier (I configured it statically for port 10904, which is one of the querier ports).
  2. I launched a local OpenTelemetry Collector and Jaeger all-in-one image to receive and display traces.
  3. I modified scripts/quickstart.sh to:
  • Add a query frontend, with the query-frontend.downstream-url pointed to the intercepting ngrok tunnel.
  • Add appropriate tracing configurations for all components.

Once those modifications were in place and launched with make quickstart, I was able to write queries in the query frontend, see that the intercepted API calls to the downstream querier contained tracing headers in the expected format, and verify that the traces were ingested into the Jaeger backend via the OpenTelemetry collector (the OpenTelemetry collector was not strictly necessary, but I prefer to use it while debugging).

Screenshot showing the intercepted API request, with correct B3 headers: Screen Shot 2022-07-29 at 11 16 35 AM

Screenshot of the complete trace ingested to the Jaeger backend, with correct context-propagation and parent/child span relationships preserved: Screen Shot 2022-07-29 at 11 16 44 AM

ahayworth avatar Jul 29 '22 16:07 ahayworth

cc @fpetkovski, since I think you may be interested in this.

ahayworth avatar Jul 29 '22 16:07 ahayworth

Since we have another WIP pr for jaeger Opentelemetry migration, maybe it is better to have this after https://github.com/thanos-io/thanos/pull/5411 is done?

yeya24 avatar Jul 30 '22 14:07 yeya24

Is there a reason not to add these by default?

@GiedriusS That depends on exactly what you mean with the question.

  • If you mean "make the default true", then I would say that it's a breaking change for anyone that's relying on the current behavior today (and I don't know how many people that might be).
  • If you mean "why not always add those headers in addition to the normal uber-trace-id header", that's certainly possible. In OpenTelemetry terms, that would be a "composite propagator", and it isn't extraordinarily difficult to implement. However, the OpenTracing standard never defined such a concept and so the jaeger-client-go library didn't implement one (as far as I can see, anyways). We'd just need to write that here. That said: I don't know if enough people would want it to justify the effort.

Since we have another WIP pr for jaeger Opentelemetry migration, maybe it is better to have this after https://github.com/thanos-io/thanos/pull/5411 is done?

@yeya24 If that's going to merge soon, then perhaps it is better to wait. My impression was that the PR may not be close to merging given the length of time it has been open, but that impression may have been mistaken! 😄

That said - it looks like #5411 would opentracing "basic" propagation for all supported tracing backends, which may not be what is desired (or intended). There's a new autoprop package for the otel-go SDK which may be an attractive option, since it would allow us to easily extend support for additional propagation types and also allows for the otel-standard OTEL_PROPAGATORS env var support. If included in #5411, that could make this PR obsolete entirely.

ahayworth avatar Aug 01 '22 16:08 ahayworth

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Oct 01 '22 10:10 stale[bot]

Looks like this PR is not ready for review. Does it make sense to focus efforts there @ahayworth ?

fpetkovski avatar Oct 01 '22 13:10 fpetkovski

Looks like this PR is not ready for review.

It was at the time it was submitted! 😄 But, it would have to be reworked since #5411 merged.

Does it make sense to focus efforts there @ahayworth ?

No, I don't think so (and I won't continue this PR). #5411 allows me to solve the problem of interoperability that I faced in an entirely different way: using OpenTelemetry native tracing and propagation. (Not even just different, I think it's a better solution!).

I'll close this for now, but if anyone else needs this support in the future the PR can serve as a helpful guide for anyone that looks to implement B3 header support for the legacy Jaeger tracing.

ahayworth avatar Oct 03 '22 12:10 ahayworth