sentry-ruby icon indicating copy to clipboard operation
sentry-ruby copied to clipboard

Add basic OpenTelemetry support

Open mjq-sentry opened this issue 3 years ago • 3 comments

Note: This was a hack week project. Submitting draft for feedback.


For users who already use OpenTelemetry instrumentation or would prefer to use it instead of Sentry's own tracing API, provide a means to disable Sentry's automatic trace instrumentation and a span processor that exports OpenTelemetry spans to Sentry.

Usage

Sentry.init do |config|
  # ...

  # Enable trace collection
  config.traces_sample_rate = 1.0  # or use traces_sampler

  # Disable automatic span instrumentation
  config.auto_instrument_traces = false
end

OpenTelemetry::SDK.configure do |c|
  # ...

  c.add_span_processor(Sentry::OpenTelemetry::SpanProcessor.new)
end

Setting auto_instrument_traces to false in Sentry.init stops Sentry from automatically creating any transactions or spans. (Tracing must still be enabled with traces_sample_rate or traces_sampler.)

Sentry::OpenTelemetry::SpanProcessor is notified whenever an OTel span is started or stopped. In reaction, it starts or stops a Sentry transaction (if the span has no parent) or span. When the transaction is finished it is picked up and processed by Sentry's worker thread like any other transaction would be.

If the child service uses the Sentry::Rack::CaptureExceptions middleware (see order caveat below), then its transactions will be linked inside Sentry with the parent service's transaction.

Caveats

  • sentry-rails is not yet supported (haven't looked into it yet).
  • Sentry.init must be run before OpenTelemetry::SDK.configure (Sentry's Net::HTTP monkey patch to add the sentry-trace header must run after OTel's monkey-patch that adds an HTTP span)
  • Sentry::Rack::CaptureExceptions must be before OpenTelemetry::Instrumentation::Rack::Middlewares::TracerMiddleware in the middleware stack (the hub and scope must be set up before OTel creates any spans)
  • Since Sentry spans are started and stopped in reaction to OTel spans starting and stopping, the durations reported might be slightly different. (This should be fixable by overriding the timestamps on the Sentry spans we create to match OTel's).
  • In the case where an OTel root span has a child span that starts after the root span has finished, this will be represented in Sentry as two transactions (one for the original root span, and one for the late child).

mjq-sentry avatar Aug 26 '22 04:08 mjq-sentry

Are span processors common in otel sdks? aka could we do this everywhere?

dcramer avatar Aug 26 '22 20:08 dcramer

It's a part of the core spec - https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#span-processor, and supported by every major SDK.

AbhiPrasad avatar Aug 26 '22 20:08 AbhiPrasad

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

github-actions[bot] avatar Sep 20 '22 00:09 github-actions[bot]

ok I'm taking this up now

so let's talk about this auto_instrument_traces stuff first.

Setting auto_instrument_traces to false in Sentry.init stops Sentry from automatically creating any transactions or spans.

Currently only net/http and redis span creation is disabled by this flag. What do we want from this flag? Do we want to turn off ALL instrumentation by sentry if otel is present? To prevent otel/sentry from stepping on each others' toes, we should just no-op most of the performance part of the SDK (except sampling) and that's a bit more work. If that's what we want, I can think of what the best way to do that is.

sl0thentr0py avatar Oct 03 '22 12:10 sl0thentr0py

Ideally we should turn off all automatic instrumentation that Sentry does with this flag - but still keep user's manual instrumentation. The reason I say this is because if we turn off all instrumentation, then we'll have API issues because we need to access startTransaction inside the collector itself.

Turning off all instrumentation is maybe cleaner though, because we can just instruct folks to switch to OpenTelemetry usage completely. Perhaps it's easier to start with turning off all automatic instrumentation? And then we can see how that looks/feels?

AbhiPrasad avatar Oct 03 '22 12:10 AbhiPrasad

Codecov Report

Base: 98.33% // Head: 98.39% // Increases project coverage by +0.06% :tada:

Coverage data is based on head (8a1c99f) compared to base (5bfdf80). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1876      +/-   ##
==========================================
+ Coverage   98.33%   98.39%   +0.06%     
==========================================
  Files         151      156       +5     
  Lines        9306     9669     +363     
==========================================
+ Hits         9151     9514     +363     
  Misses        155      155              
Impacted Files Coverage Δ
sentry-opentelemetry/lib/sentry-opentelemetry.rb 100.00% <100.00%> (ø)
...entelemetry/lib/sentry/opentelemetry/propagator.rb 100.00% <100.00%> (ø)
...lemetry/lib/sentry/opentelemetry/span_processor.rb 100.00% <100.00%> (ø)
...metry/spec/sentry/opentelemetry/propagator_spec.rb 100.00% <100.00%> (ø)
...y/spec/sentry/opentelemetry/span_processor_spec.rb 100.00% <100.00%> (ø)
...ls/lib/sentry/rails/tracing/abstract_subscriber.rb 74.07% <100.00%> (ø)
...ntry/rails/tracing/action_controller_subscriber.rb 100.00% <100.00%> (ø)
...b/sentry/rails/tracing/active_record_subscriber.rb 100.00% <100.00%> (ø)
.../sentry/rails/tracing/active_storage_subscriber.rb 100.00% <100.00%> (ø)
sentry-ruby/lib/sentry/configuration.rb 98.47% <100.00%> (+0.03%) :arrow_up:
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Oct 10 '22 13:10 codecov-commenter

ok this is backwards compat now with a new config.instrumenter option which defaults to :sentry.

sl0thentr0py avatar Oct 10 '22 13:10 sl0thentr0py

@st0012, @AbhiPrasad this is very big now but it's ready to ship I think. We can fix small things and iterate on them later after shipping. Please let me know if you have questions! I will comment on the files themselves to explain some decisions for @st0012.

I still wanna write some integration tests but you guys can start reviewing whenever you have time.


Summary

  • new gem sentry-opentelemetry to be used as described here
  • all sentry instrumentation will be disabled by a new config.instrumeneter option so that sentry and otel do not step on each other's toes
  • the 2 main things are
    • a SpanProcessor that hooks into on_start/on_finish whenever OpenTelemetry starts/finishes a span. This in turn creates sentry transactions/spans as necessary and keeps track of them in an internal span_map hash.
    • a Propagator to make sure distributed tracing and dynamic sampling work by hooking into inject/extract for incoming/outgoing sentry-trace/baggage header management.

Spec

https://develop.sentry.dev/sdk/performance/opentelemetry/

sl0thentr0py avatar Nov 18 '22 14:11 sl0thentr0py

Is the goal to let users use opentelemetry-ruby but send the events to Sentry?

Yes, exactly, we want to make it easier for users using otel to switch over to sentry for performance while continuing to use sentry for errors.

If that's true, I assume we want them to opt-in to one side completely? e.g. Not tracing some parts with open-telemetry and the rest with Sentry's current tracing APIs/integrations.

Yes, this is what the instrumenter option does, it's a switch between either sentry/otel. We might generalize this later depending on adoption but for now, this is what we will ship.

What concurrency model does opentelemetry-ruby support and does it fit with our a Hub/thread design?

The initial design was using scope but we removed that so our hub/scope management should not matter here anymore. All concurrency handling in otel is done by the Context and we just rely on them to do the right thing.

What'd happen if some Sentry's automatic traces slip in?

It's fine, not the worst. This is very much an MVP/experiment so we'll just fix it if someone reports a bug. I've tried to make sure that doesn't happen though.

Most of the changes in sentry-ruby could be extracted to a separate PR.

Apologies for the size, we were iterating on this a fair bit so I wasn't sure what shape it'd finally take. I'll split this up into smaller PRs later today. 👍

sl0thentr0py avatar Nov 22 '22 13:11 sl0thentr0py

Tracking list of new PRs:

  • https://github.com/getsentry/sentry-ruby/pull/1944
  • https://github.com/getsentry/sentry-ruby/pull/1945
  • https://github.com/getsentry/sentry-ruby/pull/1946
  • https://github.com/getsentry/sentry-ruby/pull/1947
  • https://github.com/getsentry/sentry-ruby/pull/1948

sl0thentr0py avatar Nov 22 '22 14:11 sl0thentr0py

closing in favour of split PRs

sl0thentr0py avatar Nov 23 '22 16:11 sl0thentr0py