sentry-go
sentry-go copied to clipboard
ref(otel): Add `ClientOptions.Instrumenter`
This small PR is a first attempt to fix the perceived issue in #678.
It extends sentry.ClientOptions
with the Instrumenter
parameter, analogous to other language SDKs. This can be optionally set to "otel" or "sentry" (the default).
This new option is then respected by middlewares to optionally skip creating a sentry transaction, allowing the consumer to only use "pure" otel for traces.
Considered alternatives:
Another option would be making the transaction created by the otel span processor visible to the middlewares. However, since the span processor cannot easily modify the request context, it becomes necessary to add additional helpers (e.g. a new "adapter" middleware) with the single task of injecting the otel-created transaction into the request context.
Fixes: #678 #788
Codecov Report
Attention: Patch coverage is 97.43590%
with 1 lines
in your changes are missing coverage. Please review.
Project coverage is 81.20%. Comparing base (
255c172
) to head (8249502
).
Files | Patch % | Lines |
---|---|---|
otel/span_processor.go | 88.88% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #679 +/- ##
==========================================
+ Coverage 80.89% 81.20% +0.30%
==========================================
Files 48 48
Lines 4717 4730 +13
==========================================
+ Hits 3816 3841 +25
+ Misses 763 754 -9
+ Partials 138 135 -3
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
The windows test failure seems unrelated? Maybe flaky?
As for the coverage decrease, I could add some tests for this case if the general approach seems reasonable.
Yep, Windows tests are sadly a bit flaky 🙁 More tests are always welcome. I didn't look into how other SDKs handle this yet, did you maybe check https://github.com/getsentry/sentry-javascript already?
Sorry for the delay; vacation got in the way :wink:
My javascript is pretty rusty (no :crab: pun intended), but AFAICT the js integration works slightly differently. It always uses an existing span context (similarly to the proposed change in this PR), but it also has an extra instrumenter
option, unlike the go integration (and this PR, for that matter).
The extra option seems a bit error-prone, compared to the sample-func approach in this PR: the code has to remember to check against known values, instead of explicitly passing exactly what we want it to do. But that's just my subjective (and superficial!) take on it.
I've reworked the PR with a slightly different solution. Now instead of trying to use a sampler, we address the issue at the middleware level.
The problem - as I curently understand it - is related to middleware call-ordering: we want otel to be called first, to respect potential sampling decisions from a remote caller. That sampling decision should be passed to the sentry world, which was already done in the otel span processor, but was not working reliably (fixed in this PR).
After the otel middleware, we still want to use the sentry middleware (at least for event tracking and recovery), but at that point, the existing sentry span created in the span processor above is not in the context, so the sentry middleware happily creates a new one (and without a parent). To avoid this, we need to tell the sentry middleware to also look for an existing otel span and get the sentry span from that.
Does that make sense?
Ok, third time is a charm!
The sentry.SpanOption
doesn't quite work, because it can't modify it's received context. So we have to go with yet another middleware, which sits as an adapter between otel and sentryhttp.Handler
.
It sets the transaction created via the span processor in the context passed to sentryhttp.Handler
, so that it detects its existence and can react accordingly.
PTAL
@cleptric can you please allow the tests to run? :pray:
FYI: :repeat: rebased on master, please unlock tests :pray:
Sorry that it took us a while to get back to you.
When using Otel, you shouldn't use any Sentry performance-related features. This is why we added the instrumenter
option on other SDKs, which would disable all Sentry instrumentation. In Go, we opted not to add this option, given the manual nature of performance tracing.
So instead of using Sentry's net/http
instrumentation, you would use https://github.com/open-telemetry/opentelemetry-go-contrib.
Our propagator considers incoming trace headers, so distributed tracing also works.
So instead of using Sentry's net/http instrumentation, you would use https://github.com/open-telemetry/opentelemetry-go-contrib.
So you mean skipping the http middleware? But then we'd also throw out error/panic tracking?
If that's the intended way to work with OTEL, then maybe I should change this PR to instead add an option to sentryhttp.Options
to skip traces?
This is a good point about panic tracking I haven't thought about.
IMO, we could wrap this code https://github.com/getsentry/sentry-go/blob/master/http/sentryhttp.go#L97-L112 into an if condition that checks for ClientOptions.EnableTracing
, so no transactions are created.
This might make it a bit easier than adding a new integration option.
@cleptric you mean disabling ClientOptions.EnableTracing
? I think that alone won't quite work either: then all transactions get dropped (even if the OTEL propagator decides to sample them).
We could maybe refactor the sampling logic to respect sample decisions even if EnableTracing
is false
, but I fear that might be too surprising to users.
In the end we want sentryhttp.Handler
to only start a transaction if none has been started already (e.g. by the otel propagator), but unfortunately I haven't found a nice way to pass that informatino along. The propagator can't modify the request context and they are otherwise unaware of each other. That's why I ended up with the current suggestion in this PR. :confused:
It seems to me we have two options:
- we either need to glue the sentry OTEL propagator to the sentryhttp middleware (e.g. via the proposed middleware in this PR)
- or we add an explicit option to the sentryhttp middleware to not start transactions
Ok, we might need to add the instrumenter
option to control the behaviour. If Otel is active, no transactions/spans should be created by the Sentry SDK.
I think adding the instrumenter
option also seems to be consistent with the other sdks.
@cleptric @greywolve now this PR has an attempted implementation of the new TracesInstrumenter
option. PTAL :pray:
~~I want to add another perspective to this MR, that may mean the ClientOptions.Instrumenter
is not enough: some client SDKs (e.g. sentry-cocoa) do not seem to support opentelemetry at all. These transactions would be lost when speaking to a sentry+otel service. The service only reads otel headers, the client only sends sentry headers.~~
~~The suggestion with the additional middleware would not have this problem: incoming requests could use either sentry or otel propagation, the service code can leverage existing otel integrations (e.g. for pgx; shameless plug :see_no_evil:)~~
~~Unless sentry plans to support otel propagation on client sdks, it may be necessary to make the service side accept both. Or am I overlooking something?~~
(nevermind, missed the propagator code, sorry for the noise :bow:)
We actually extract the Sentry trace header, see
https://github.com/getsentry/sentry-go/blob/master/otel/propagator.go#L77-L95
@cleptric ah, you're right of course. Appologies for the noise; disregard the last comment.
LGTM, but I'll let @cleptric give the final thumbs up :)
hi! just a little bi-monthly rebase + reminder about this PR :innocent:
maybe I can get this PR as a christmas present? :innocent: :santa: :christmas_tree:
(just rebased :repeat:)
Rebased again.
Any chance a paying customer can get some feedback? Even if this isn't the right solution?
It's still on our radar!
@cleptric customer has requested an update on this PR. Is there any?
@cleptric any updates for this?