cadence-client icon indicating copy to clipboard operation
cadence-client copied to clipboard

Add support for OpenTelemetry via the Otel bridge

Open tfcace opened this issue 2 years ago • 9 comments

Related issue: https://github.com/uber-go/cadence-client/issues/1156

What changed? Support for Otel was added in the form of the opentracing Bridge.

This will allow a user of the SDK to set an Otel tracer + an Ot bridge. The bridge tracer will conform with the opentracing.Tracer interface with minimal impact on the Cadence SDK, while passing all tracing responsibility to the Otel tracer.

Probably the effort to replace the Ot interface with the one for Otel is more substantial, and it was just a matter of effort / capacity.

Why? Want to be able to propagate traces to multiple Otel backends, not just Jaeger.

How did you test it? Ran unit tests and integration tests. Also verified manually with Jaeger and Google Cloud tracing

Potential risks Nothing I can think of.

tfcace avatar May 10 '22 15:05 tfcace

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 10 '22 15:05 CLAassistant

Hi @Groxx, any clue if the tests passed?

tfcace avatar May 22 '22 11:05 tfcace

Sorry about the delays :| Got busy and then moved across the country, and kinda only now catching up again.

I'll re-trigger the test runs, though I suspect tests will pass... which doesn't quite cover it for this, as there's a bunch of room for subtler changes like different data emitted in prod and possibly-different semantics :\

I'm attempting to find more specific docs for the terms (thanks for the links btw), and I think we'll try running this in our Jaeger stack internally to make sure it all works / still looks sane. An exact 1:1 isn't really necessary, it just has to be reasonable.

Groxx avatar Jun 22 '22 03:06 Groxx

I've got a fix for one piece of the test failures currently, in #1175 (in the meantime, just make copyright), and I'm not sure if it was working before or not, but it needs a go mod tidy. Then unit tests pass at least.

Groxx avatar Jun 22 '22 03:06 Groxx

Hi @Groxx, thanks. You're last comment was applied. Crossing fingers :)

tfcace avatar Jun 22 '22 04:06 tfcace

Tests pass, I kinda have to figure out a way to make sure the output data still works though :| I'm not entirely sure how to do that off-hand. We have some pre-production environments I could just deploy it to as a fallback, but I kinda think it'd be worth making a docker-compose setup in the server repo with jaeger 🤔 . It'd be useful for this, and for anyone making future tracing changes (I definitely want to do so!), and for users looking to do their own tracing stuff locally.

If you want to give building that a try, feel free, otherwise unfortunately I've got to focus on some other stuff in the short term (this week?). But I'll try to hack something up ASAP, and we can leave the "get a dev-friendly jaeger-including docker-compose PR we can merge to master" for some other day.

Groxx avatar Jun 22 '22 19:06 Groxx

Hi @Groxx, that's good news. Can't say I totally understand what is the desired outcome (assuming your internal outputs that you expect to see).

For what it's worth, there's a tracing sample I cooked up with @longquanzheng a while back in https://github.com/uber-common/cadence-samples/tree/master/cmd/samples/recipes/tracing

It should work with the Jaeger all-in-one docker.

I'll be AFK for a few days starting tomorrow. But I may be able to take a stab at modifying the docker compose. Can you add some details? Does it go in the main Cadence repo as new PR? Assuming that the compose also spins the Jaeger all-in-one, how does the Cadence team expect to make sure there's no regression?

Anyways, since I'm very interested in incorporating Otel into our Cadence stack, but don't want to rely on a fork, I want to do my best to help push this forward.

Cheers.

tfcace avatar Jun 22 '22 19:06 tfcace

Hi @Groxx, that's good news. Can't say I totally understand what is the desired outcome (assuming your internal outputs that you expect to see).

tbh me neither :) gonna just look at some traces before and after and see if it looks borked.

For what it's worth, there's a tracing sample I cooked up with @longquanzheng a while back in https://github.com/uber-common/cadence-samples/tree/master/cmd/samples/recipes/tracing

It should work with the Jaeger all-in-one docker.

Excellent, I'll give that a look, thanks!

I'll be AFK for a few days starting tomorrow. But I may be able to take a stab at modifying the docker compose. Can you add some details? Does it go in the main Cadence repo as new PR?

Eventually at least, yes. For verifying this I think it doesn't really matter, but a PR is at least an easy way to share code.

Assuming that the compose also spins the Jaeger all-in-one, how does the Cadence team expect to make sure there's no regression?

I'm planning on making it a non-default compose file, so preventing regression probably doesn't matter all that much. E.g. we can just call it an unstable dev setup and be done with it.

Actually committing to stability will probably require some E2E tests to prevent accidents, and I have no idea what that'd look like at the moment. Jaeger has APIs of course, so something is definitely possible.

Groxx avatar Jun 22 '22 20:06 Groxx

hi @Groxx, have you had a chance to play with this for abit?

tfcace avatar Jul 24 '22 08:07 tfcace

Using Otel can be achieved by using a Workflow Interceptor instead of modifying the SDK. Closing this PR.

tfcace avatar Sep 07 '23 19:09 tfcace