dd-trace-rb
dd-trace-rb copied to clipboard
Support distributed tracing for Sidekiq
Is your feature request related to a problem? Please describe.
It would be great to know which systems enqueue Sidekiq jobs and link their traces to other services.
For example, Service 1 makes a request to Service 2 and Service 2 enqueues a Sidekiq job. Currently, we see traces only of Service 1 and 2 and Sidekiq traces will be disconnected.
Describe the goal of the feature
Ideally, it would be cool to see something like this:

It will also help with debugging, when the same job can be enqueued from different places.
PS. Sorry if this is a duplicate, I tried to search but didn't find a similar issue.
I may try to do this, but I'll point out that the OTEL sidekiq library does proper propagation for sidekiq jobs using middleware and just injecting the trace info into the job:
Client: https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/sidekiq/lib/opentelemetry/instrumentation/sidekiq/middlewares/client/tracer_middleware.rb#L31
Server: https://github.com/open-telemetry/opentelemetry-ruby-contrib/blob/main/instrumentation/sidekiq/lib/opentelemetry/instrumentation/sidekiq/middlewares/server/tracer_middleware.rb#L31
👋, we are talking about similar support to this coming up soon, especially since we are going to push more OTEL work in the Ruby tracer.
The OTEL implementation that @jackweinbender linked is very similar to how one would accomplish in with ddtrace.
If you'd like this Sidekiq trace to look just like you linked above, here's how you'd do it using our public API:
require 'datadog/tracing/distributed/metadata/datadog'
def my_application_code
dd_digest = {}
Datadog::Tracing::Distributed::Metadata::Datadog.inject!(Datadog::Tracing.active_trace&.to_digest, dd_digest)
Worker.perform_async(your_args..., dd_digest)
end
class Worker
include Sidekiq::Worker
def perform(your_args..., dd_digest = nil)
Datadog::Tracing.continue_trace!(
Datadog::Tracing::Distributed::Metadata::Datadog.extract(dd_digest)
) if dd_digest
end
end
These methods are pretty resilient to empty values or nil, should these should work pretty safely.
The one quirk to keep in mind is that Datadog::Tracing.active_trace can be nil if it is invoked and there's no active span, thus the conditional call in the statement: Datadog::Tracing.active_trace&.to_digest. So if you are testing this in a REPL, make sure to execute Datadog::Tracing::Distributed::Metadata::Datadog.inject!(Datadog::Tracing.active_trace&.to_digest, dd_digest) inside an open ddtrace span context.
looks like the API has changed between in v1.7.0 (https://github.com/DataDog/dd-trace-rb/pull/2352), here's my updated code:
class ClientTracingMiddleware
include Sidekiq::ClientMiddleware
def initialize
@propagation = Datadog::Tracing::Distributed::Datadog.new(fetcher: Datadog::Tracing::Distributed::Fetcher)
end
def call(_worker_class, job, _queue, _redis_pool)
@propagation.inject!(Datadog::Tracing.active_trace&.to_digest, job)
yield
end
end
class ServerTracingMiddleware
include Sidekiq::ServerMiddleware
def initialize
@propagation = Datadog::Tracing::Distributed::Datadog.new(fetcher: Datadog::Tracing::Distributed::Fetcher)
end
def call(_worker, job, _queue, &block)
Datadog::Tracing.continue_trace!(@propagation.extract(job), &block)
end
end
Sidekiq.configure_client do |config|
config.client_middleware do |chain|
chain.add ClientTracingMiddleware
end
end
Sidekiq.configure_server do |config|
config.server_middleware do |chain|
chain.insert_before Datadog::Tracing::Contrib::Sidekiq::ServerTracer, ServerTracingMiddleware
end
end
@sled I haven't seemed to make it work - where did you place those classes? I'm using ddtrace-rb v.1.10.1 but when invoking a process that fires a sidekiq job - I see nothing in the flame graph
@taltcher I put the ClientTracingMiddleware and ServerTracingMiddleware in the lib/ folder of my rails project, the Sidekiq.configure_client and Sidekiq.configure_server are in config/initializers/sidekiq.rb
You can also put all of them inside the initializer. Also make sure to re-start your application server and sidekiq worker after code changes because the initializer runs only once at boot time.
I'd put some debug statements in the middleware's #call method to ensure its' getting called. Maybe also output the value of Datadog::Tracing.active_trace&.to_digest to make sure it has an active trace when enqueuing the job. Same goes for @propagation.extract(job) this should output the same values when performing the job.
All this middleware does is storing some metadata (trace IDs) on the job data when enqueueing the job and loading it again when performing the job.
@sled cool, thanks! :-) let me check and see if it is working for me
👋 @llxff @sled @jackweinbender @taltcher , I just merged distributed tracing for sidekiq. It should be released soon!
@TonyCTHsu Brilliant! How will we enable it?
@TonyCTHsu - when will it be available to use?
Hey folks! :wave: Stepping in for Tony, he's out for a few days :)
The instructions for setting up sidekiq, including the new distributed_tracing are up on https://github.com/datadog/dd-trace-rb/blob/master/docs/GettingStarted.md#sidekiq .
(I just realized that mistakenly we forgot to trigger the process to also update the docs that show up on https://docs.datadoghq.com/ -- fix incoming as well).
The sidekiq integration is available as part of the 1.11.0.beta1 release which is now available on rubygems.
We did a couple of other big changes in that release, so we decided to do an extra public beta release before putting out 1.11.0 final out of an abundance of caution BUT we're very confident that 1.11.0.beta1 is solid and I can recommend starting using it today if you want to get the new goodies :)
Edit: The final 1.11.0 should be out in the next couple of weeks, if you'd prefer to wait!
Since this is now available, I'm going ahead and click close on this ticket but please don't take this as a sign we don't want to hear from y'all!
Feel free to always comment or open an issue, we want all the feedback we can get :)
Nice work, we had a custom middleware to add this and now we don't need it.
Important: Enabling distributed_tracing for asynchronous processing can result in drastic changes in your trace graph. Such cases include long running jobs, retried jobs, and jobs scheduled in the far future. Make sure to inspect your traces after enabling this feature.
Has anyone here considered the possibility of ONLY the first sidekiq attempt (the web request going async immediately afterwards for many apps) picking up the trace context... and further retries (in cases of errors) being their own traces? For many apps where errors are an edge case this would seem like having out cake and eating it to (most of the time at least).
This would not have the problem of "drastic changes" that result in a trace multiple days long, etc...
Has anyone here considered the possibility of ONLY the first sidekiq attempt (the web request going async immediately afterwards for many apps) picking up the trace context... and further retries (in cases of errors) being their own traces? For many apps where errors are an edge case this would seem like having out cake and eating it to (most of the time at least).
This would not have the problem of "drastic changes" that result in a trace multiple days long, etc...
This is an interesting suggestion and can work for many use cases.
The only case I can think of that this approach wouldn't quite work well for is when the a Sidekiq job is scheduled for the far future (many hours or more in the future). This is something that can be detected, though.