dd-trace-rb Support distributed tracing for Sidekiq

Is your feature request related to a problem? Please describe.

It would be great to know which systems enqueue Sidekiq jobs and link their traces to other services.

For example, Service 1 makes a request to Service 2 and Service 2 enqueues a Sidekiq job. Currently, we see traces only of Service 1 and 2 and Sidekiq traces will be disconnected.

Describe the goal of the feature

Ideally, it would be cool to see something like this:

Screenshot 2022-09-28 at 13 04 19

It will also help with debugging, when the same job can be enqueued from different places.

PS. Sorry if this is a duplicate, I tried to search but didn't find a similar issue.

Sep 28 '22 11:09 llxff

I may try to do this, but I'll point out that the OTEL sidekiq library does proper propagation for sidekiq jobs using middleware and just injecting the trace info into the job:

Client: https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/sidekiq/lib/opentelemetry/instrumentation/sidekiq/middlewares/client/tracer_middleware.rb#L31

Server: https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/sidekiq/lib/opentelemetry/instrumentation/sidekiq/middlewares/server/tracer_middleware.rb#L31

Sep 28 '22 18:09 jackweinbender

👋, we are talking about similar support to this coming up soon, especially since we are going to push more OTEL work in the Ruby tracer.

The OTEL implementation that @jackweinbender linked is very similar to how one would accomplish in with ddtrace.

If you'd like this Sidekiq trace to look just like you linked above, here's how you'd do it using our public API:

require 'datadog/tracing/distributed/metadata/datadog'

def my_application_code
  dd_digest = {}
  Datadog::Tracing::Distributed::Metadata::Datadog.inject!(Datadog::Tracing.active_trace&.to_digest, dd_digest)

  Worker.perform_async(your_args..., dd_digest)
end

class Worker
  include Sidekiq::Worker

  def perform(your_args..., dd_digest = nil)
    Datadog::Tracing.continue_trace!(
      Datadog::Tracing::Distributed::Metadata::Datadog.extract(dd_digest)
    ) if dd_digest
  end
end

These methods are pretty resilient to empty values or nil, should these should work pretty safely.

The one quirk to keep in mind is that Datadog::Tracing.active_trace can be nil if it is invoked and there's no active span, thus the conditional call in the statement: Datadog::Tracing.active_trace&.to_digest. So if you are testing this in a REPL, make sure to execute Datadog::Tracing::Distributed::Metadata::Datadog.inject!(Datadog::Tracing.active_trace&.to_digest, dd_digest) inside an open ddtrace span context.

Sep 28 '22 20:09 marcotc

looks like the API has changed between in v1.7.0 (https://github.com/DataDog/dd-trace-rb/pull/2352), here's my updated code:

class ClientTracingMiddleware
  include Sidekiq::ClientMiddleware

  def initialize
    @propagation = Datadog::Tracing::Distributed::Datadog.new(fetcher: Datadog::Tracing::Distributed::Fetcher)
  end

  def call(_worker_class, job, _queue, _redis_pool)
    @propagation.inject!(Datadog::Tracing.active_trace&.to_digest, job)
    yield
  end
end

class ServerTracingMiddleware
  include Sidekiq::ServerMiddleware

  def initialize
    @propagation = Datadog::Tracing::Distributed::Datadog.new(fetcher: Datadog::Tracing::Distributed::Fetcher)
  end

  def call(_worker, job, _queue, &block)
    Datadog::Tracing.continue_trace!(@propagation.extract(job), &block)
  end
end

Sidekiq.configure_client do |config|
  config.client_middleware do |chain|
    chain.add ClientTracingMiddleware
  end
end

Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    chain.insert_before Datadog::Tracing::Contrib::Sidekiq::ServerTracer, ServerTracingMiddleware
  end
end

Mar 14 '23 13:03 sled

@sled I haven't seemed to make it work - where did you place those classes? I'm using ddtrace-rb v.1.10.1 but when invoking a process that fires a sidekiq job - I see nothing in the flame graph

Apr 01 '23 17:04 taltcher

@taltcher I put the ClientTracingMiddleware and ServerTracingMiddleware in the lib/ folder of my rails project, the Sidekiq.configure_client and Sidekiq.configure_server are in config/initializers/sidekiq.rb

You can also put all of them inside the initializer. Also make sure to re-start your application server and sidekiq worker after code changes because the initializer runs only once at boot time.

I'd put some debug statements in the middleware's #call method to ensure its' getting called. Maybe also output the value of Datadog::Tracing.active_trace&.to_digest to make sure it has an active trace when enqueuing the job. Same goes for @propagation.extract(job) this should output the same values when performing the job.

All this middleware does is storing some metadata (trace IDs) on the job data when enqueueing the job and loading it again when performing the job.

Apr 03 '23 08:04 sled

@sled cool, thanks! :-) let me check and see if it is working for me

Apr 03 '23 08:04 taltcher

👋 @llxff @sled @jackweinbender @taltcher , I just merged distributed tracing for sidekiq. It should be released soon!

Apr 13 '23 13:04 TonyCTHsu

@TonyCTHsu Brilliant! How will we enable it?

Apr 13 '23 15:04 jjenkins-aurorasolar

@TonyCTHsu - when will it be available to use?

Apr 18 '23 19:04 taltcher

Hey folks! :wave: Stepping in for Tony, he's out for a few days :)

The instructions for setting up sidekiq, including the new distributed_tracing are up on https://github.com/datadog/dd-trace-rb/blob/master/docs/GettingStarted.md#sidekiq .

(I just realized that mistakenly we forgot to trigger the process to also update the docs that show up on https://docs.datadoghq.com/ -- fix incoming as well).

The sidekiq integration is available as part of the 1.11.0.beta1 release which is now available on rubygems.

We did a couple of other big changes in that release, so we decided to do an extra public beta release before putting out 1.11.0 final out of an abundance of caution BUT we're very confident that 1.11.0.beta1 is solid and I can recommend starting using it today if you want to get the new goodies :)

Edit: The final 1.11.0 should be out in the next couple of weeks, if you'd prefer to wait!

Apr 19 '23 08:04 ivoanjo

Since this is now available, I'm going ahead and click close on this ticket but please don't take this as a sign we don't want to hear from y'all!

Feel free to always comment or open an issue, we want all the feedback we can get :)

Apr 19 '23 08:04 ivoanjo

Nice work, we had a custom middleware to add this and now we don't need it.

May 04 '23 06:05 ioquatix

Important: Enabling distributed_tracing for asynchronous processing can result in drastic changes in your trace graph. Such cases include long running jobs, retried jobs, and jobs scheduled in the far future. Make sure to inspect your traces after enabling this feature.

Has anyone here considered the possibility of ONLY the first sidekiq attempt (the web request going async immediately afterwards for many apps) picking up the trace context... and further retries (in cases of errors) being their own traces? For many apps where errors are an edge case this would seem like having out cake and eating it to (most of the time at least).

This would not have the problem of "drastic changes" that result in a trace multiple days long, etc...

Jun 22 '23 10:06 joshgoebel

Has anyone here considered the possibility of ONLY the first sidekiq attempt (the web request going async immediately afterwards for many apps) picking up the trace context... and further retries (in cases of errors) being their own traces? For many apps where errors are an edge case this would seem like having out cake and eating it to (most of the time at least).

This would not have the problem of "drastic changes" that result in a trace multiple days long, etc...

This is an interesting suggestion and can work for many use cases.

The only case I can think of that this approach wouldn't quite work well for is when the a Sidekiq job is scheduled for the far future (many hours or more in the future). This is something that can be detected, though.

Jun 27 '23 17:06 marcotc

dd-trace-rb dd-trace-rb copied to clipboard

Support distributed tracing for Sidekiq

dd-trace-rb
dd-trace-rb copied to clipboard