sentry-go icon indicating copy to clipboard operation
sentry-go copied to clipboard

ref(otel): Add `ClientOptions.Instrumenter`

Open costela opened this issue 1 year ago • 25 comments

This small PR is a first attempt to fix the perceived issue in #678.

It extends sentry.ClientOptions with the Instrumenter parameter, analogous to other language SDKs. This can be optionally set to "otel" or "sentry" (the default).

This new option is then respected by middlewares to optionally skip creating a sentry transaction, allowing the consumer to only use "pure" otel for traces.

Considered alternatives:

Another option would be making the transaction created by the otel span processor visible to the middlewares. However, since the span processor cannot easily modify the request context, it becomes necessary to add additional helpers (e.g. a new "adapter" middleware) with the single task of injecting the otel-created transaction into the request context.

Fixes: #678 #788

costela avatar Jul 28 '23 22:07 costela

Codecov Report

Attention: Patch coverage is 97.43590% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.20%. Comparing base (255c172) to head (8249502).

Files Patch % Lines
otel/span_processor.go 88.88% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #679      +/-   ##
==========================================
+ Coverage   80.89%   81.20%   +0.30%     
==========================================
  Files          48       48              
  Lines        4717     4730      +13     
==========================================
+ Hits         3816     3841      +25     
+ Misses        763      754       -9     
+ Partials      138      135       -3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 31 '23 13:07 codecov[bot]

The windows test failure seems unrelated? Maybe flaky?

As for the coverage decrease, I could add some tests for this case if the general approach seems reasonable.

costela avatar Jul 31 '23 13:07 costela

Yep, Windows tests are sadly a bit flaky 🙁 More tests are always welcome. I didn't look into how other SDKs handle this yet, did you maybe check https://github.com/getsentry/sentry-javascript already?

cleptric avatar Jul 31 '23 13:07 cleptric

Sorry for the delay; vacation got in the way :wink:

My javascript is pretty rusty (no :crab: pun intended), but AFAICT the js integration works slightly differently. It always uses an existing span context (similarly to the proposed change in this PR), but it also has an extra instrumenter option, unlike the go integration (and this PR, for that matter).

The extra option seems a bit error-prone, compared to the sample-func approach in this PR: the code has to remember to check against known values, instead of explicitly passing exactly what we want it to do. But that's just my subjective (and superficial!) take on it.

costela avatar Aug 16 '23 08:08 costela

I've reworked the PR with a slightly different solution. Now instead of trying to use a sampler, we address the issue at the middleware level.

The problem - as I curently understand it - is related to middleware call-ordering: we want otel to be called first, to respect potential sampling decisions from a remote caller. That sampling decision should be passed to the sentry world, which was already done in the otel span processor, but was not working reliably (fixed in this PR).

After the otel middleware, we still want to use the sentry middleware (at least for event tracking and recovery), but at that point, the existing sentry span created in the span processor above is not in the context, so the sentry middleware happily creates a new one (and without a parent). To avoid this, we need to tell the sentry middleware to also look for an existing otel span and get the sentry span from that.

Does that make sense?

costela avatar Aug 23 '23 08:08 costela

Ok, third time is a charm!

The sentry.SpanOption doesn't quite work, because it can't modify it's received context. So we have to go with yet another middleware, which sits as an adapter between otel and sentryhttp.Handler. It sets the transaction created via the span processor in the context passed to sentryhttp.Handler, so that it detects its existence and can react accordingly.

PTAL

costela avatar Aug 24 '23 11:08 costela

@cleptric can you please allow the tests to run? :pray:

costela avatar Aug 28 '23 10:08 costela

FYI: :repeat: rebased on master, please unlock tests :pray:

costela avatar Sep 19 '23 09:09 costela

Sorry that it took us a while to get back to you.

When using Otel, you shouldn't use any Sentry performance-related features. This is why we added the instrumenter option on other SDKs, which would disable all Sentry instrumentation. In Go, we opted not to add this option, given the manual nature of performance tracing.

So instead of using Sentry's net/http instrumentation, you would use https://github.com/open-telemetry/opentelemetry-go-contrib.

Our propagator considers incoming trace headers, so distributed tracing also works.

cleptric avatar Oct 04 '23 09:10 cleptric

So instead of using Sentry's net/http instrumentation, you would use https://github.com/open-telemetry/opentelemetry-go-contrib.

So you mean skipping the http middleware? But then we'd also throw out error/panic tracking?

If that's the intended way to work with OTEL, then maybe I should change this PR to instead add an option to sentryhttp.Options to skip traces?

costela avatar Oct 04 '23 12:10 costela

This is a good point about panic tracking I haven't thought about.

IMO, we could wrap this code https://github.com/getsentry/sentry-go/blob/master/http/sentryhttp.go#L97-L112 into an if condition that checks for ClientOptions.EnableTracing, so no transactions are created.

This might make it a bit easier than adding a new integration option.

cleptric avatar Oct 04 '23 12:10 cleptric

@cleptric you mean disabling ClientOptions.EnableTracing? I think that alone won't quite work either: then all transactions get dropped (even if the OTEL propagator decides to sample them).

We could maybe refactor the sampling logic to respect sample decisions even if EnableTracing is false, but I fear that might be too surprising to users.

In the end we want sentryhttp.Handler to only start a transaction if none has been started already (e.g. by the otel propagator), but unfortunately I haven't found a nice way to pass that informatino along. The propagator can't modify the request context and they are otherwise unaware of each other. That's why I ended up with the current suggestion in this PR. :confused:

It seems to me we have two options:

  • we either need to glue the sentry OTEL propagator to the sentryhttp middleware (e.g. via the proposed middleware in this PR)
  • or we add an explicit option to the sentryhttp middleware to not start transactions

costela avatar Oct 04 '23 12:10 costela

Ok, we might need to add the instrumenter option to control the behaviour. If Otel is active, no transactions/spans should be created by the Sentry SDK.

cleptric avatar Oct 04 '23 13:10 cleptric

I think adding the instrumenter option also seems to be consistent with the other sdks.

greywolve avatar Oct 04 '23 13:10 greywolve

@cleptric @greywolve now this PR has an attempted implementation of the new TracesInstrumenter option. PTAL :pray:

costela avatar Oct 04 '23 15:10 costela

~~I want to add another perspective to this MR, that may mean the ClientOptions.Instrumenter is not enough: some client SDKs (e.g. sentry-cocoa) do not seem to support opentelemetry at all. These transactions would be lost when speaking to a sentry+otel service. The service only reads otel headers, the client only sends sentry headers.~~

~~The suggestion with the additional middleware would not have this problem: incoming requests could use either sentry or otel propagation, the service code can leverage existing otel integrations (e.g. for pgx; shameless plug :see_no_evil:)~~

~~Unless sentry plans to support otel propagation on client sdks, it may be necessary to make the service side accept both. Or am I overlooking something?~~

(nevermind, missed the propagator code, sorry for the noise :bow:)

costela avatar Oct 12 '23 17:10 costela

We actually extract the Sentry trace header, see

https://github.com/getsentry/sentry-go/blob/master/otel/propagator.go#L77-L95

cleptric avatar Oct 13 '23 02:10 cleptric

@cleptric ah, you're right of course. Appologies for the noise; disregard the last comment.

costela avatar Oct 13 '23 07:10 costela

LGTM, but I'll let @cleptric give the final thumbs up :)

greywolve avatar Nov 09 '23 02:11 greywolve

hi! just a little bi-monthly rebase + reminder about this PR :innocent:

costela avatar Nov 21 '23 13:11 costela

maybe I can get this PR as a christmas present? :innocent: :santa: :christmas_tree:

(just rebased :repeat:)

costela avatar Dec 18 '23 08:12 costela

Rebased again.

Any chance a paying customer can get some feedback? Even if this isn't the right solution?

costela avatar Feb 23 '24 08:02 costela

It's still on our radar!

cleptric avatar Feb 26 '24 18:02 cleptric

@cleptric customer has requested an update on this PR. Is there any?

rodolfoBee avatar Jun 05 '24 05:06 rodolfoBee

@cleptric any updates for this?

aldy505 avatar Aug 20 '24 00:08 aldy505