dd-trace-rb icon indicating copy to clipboard operation
dd-trace-rb copied to clipboard

Datadog.tracer.active_correlation not available in Sidekiq logging

Open jeffblake opened this issue 4 years ago • 13 comments

With Sidekiq 5 (released in 2017), job logging is no longer a middleware, and now sits above all middleware in the stack.

I think this is why Datadog.tracer.active_correlation is not available in the new JobLogger, because the correlation is flushed in the Sidekiq::ServerTracer middleware.

Seems crazy that no one has noticed this before, so maybe I have it wrong?

My setup is like

Sidekiq.configure_server do |config|
  config.log_formatter = Logging::Formatters::Json.new
  config.options[:job_logger] = Logging::Sidekiq::JobLogger

    config.server_middleware do |chain|
    chain.add Logging::Sidekiq::Server::JobLogger
  end
end

Datadog.tracer.active_correlation is always 0 for space_id and trace_id in the JobLogger

Datadog.tracer.active_correlation in my Logging::Sidekiq::Server::JobLogger middleware has a span_Id, but it seems to be one of the inner spans, e.g. a postgres query.

I suppose I could grab the root span in my custom middleware, store it in Thread.current[:dd_trace] = Datadog.tracer.active_root_span and then retrieve it in the JobLogger class for logging, (with an ensure block to flush it) ? But since other people must have this problem and I can't find a documented solution, thought I'd bring it up here.

jeffblake avatar Jun 08 '20 03:06 jeffblake

FYI I don't think the stale tracer is the cause here. In the Sidekiq source, https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/processor.rb#L161 You can see that the logging of jobs (JobLogger) "wraps" the middleware, which means that the tracer is started and finished, and then yielded back to the JobLogger (who now no longer has an active trace/span).

jeffblake avatar Jun 08 '20 18:06 jeffblake

Yeah, after looking at this I arrived at the same conclusion, I think we'll have to investigate this separately.

delner avatar Jun 09 '20 15:06 delner

@jeffblake we found this recently as well, I was wondering if you implemented the thread variable successfully?

wasabigeek avatar Aug 20 '20 06:08 wasabigeek

👋 @jeffblake and @wasabigeek, as @jeffblake pointed out, job_logger is invoked before any middleware are processed:

  1. job_logger is invoked here: https://github.com/mperham/sidekiq/blob/v5.2.9/lib/sidekiq/processor.rb#L126
  2. Middleware are invoked here: https://github.com/mperham/sidekiq/blob/v5.2.9/lib/sidekiq/processor.rb#L137 (which invokes this block)

Given that, your JobLogger#call method won't be able to reach a #active_correlation, as all middleware and job work is performed and finished when you yield from JobLogger#call: all interesting work is inaccessible to the job_logger unfortunately.

This seems to me like there's a two tiered systems of Sidekiq server-side hooks:

  1. The logger: only one instance exists.
  2. Sidekiq middleware: a chained collection of processors.

One approach is for us to move our Datadog middleware to outside the context of Sidekiq middleware: we could make ddtrace a job_logger, and properly wrap any user provided logger inside ours. The downside is that this is not very maintainable, specially across Sidekiq versions and specially when users have another instrumentation library similar to ddtrace in their code-base. Another option is to monkey-patch the top-level Sidekiq::Processor#dispatch method instead: this has the same downside of making it brittle and not interacting nicely with other instrumentation frameworks.

Whenever a library has proper middleware, like Sidekiq does, using them is normally a much better maintainability experience for both users and library maintainers.

The main issue I see in this case is that a custom job_logger could be a regular middleware, instead of a different class of processing logic inside Sidekiq.

I took a look at the interface provided by both Sidekiq::JobLogger and the middleware interface and, given all the information is available to the middleware as far as I can tell, my suggestion here would be to move the logic from the job_logger (Logging::Sidekiq::JobLogger) into your middleware (Logging::Sidekiq::Server::JobLogger), and make sure to add it to Sidekiq's server_middleware after ddtrace inserts our instrumentation middleware. ddtrace adds its middleware to the end of Sidekiq's server stack when the Datadog.configure{|c| c.use :sidekiq} block completes.

Let me know if I'm missing something in your use case that can't be covered by this suggestion.

marcotc avatar Aug 20 '20 19:08 marcotc

@marcotc Your analysis seems spot on. The sidekiq creator acknowledged this when I brought it up, but expressed no interest to go back to logging being a middleware (which is the better design in my opinion).

ddtrace could provide a custom JobLogger, and consumers of the gem could use this as a guideline in their own custom JogLogger if they choose to override it?

Regarding your last paragraph, correct me if I'm wrong, but doing this in middleware seems like a no-go, because all middleware thread-variables would be flushed before the JobLogger gets a chance to log the span/trace ID? Or are you saying move logging out of JobLogger altogether? I don't like that because I prefer to follow whatever design Sidekiq gives us, whether it's correct or not...

jeffblake avatar Aug 20 '20 21:08 jeffblake

Is this fixed?

icelynjennings avatar Oct 11 '21 21:10 icelynjennings

@icelynjennings if you use Sidekiq through Active Job, then it is sort of fixed. Most logs are correlated through Active Job integration. If, however, you're using Sidekiq "natively", then unfortunately it isn't fixed.

agrobbin avatar Oct 12 '21 01:10 agrobbin

Thank you for your ActiveJob solution @agrobbin. I wonder if the recent solving of this Sidekiq issue https://github.com/mperham/sidekiq/discussions/5021 via commit https://github.com/mperham/sidekiq/commit/90535ab104b6540104af548d15baacd1291ffb5f could somehow be leveraged to inherit Rails DD tracing into sidekiq logs

icelynjennings avatar Oct 13 '21 09:10 icelynjennings

@icelynjennings it'd be great if the functionality could be expanded, but I don't think the solution in https://github.com/mperham/sidekiq/commit/90535ab104b6540104af548d15baacd1291ffb5f actually does anything to make that easier. It broadcasts any Rails.logger output to Sidekiq.logger, but the log correlation happens within the Active Job job middleware stack, so the native Sidekiq logs will still not be available (since they occur outside of the Active Job job middleware stack).

agrobbin avatar Oct 13 '21 13:10 agrobbin

For some more context: what information are we missing today because of the lack of trace-logs correlation at the job logger level?

Is it only these three ("start", "done", "fail") log lines: https://github.com/mperham/sidekiq/blob/1f05bd60102419d12733afa267d96524ece8e756/lib/sidekiq/job_logger.rb#L11-L20. Or is there more rich data that is going uncaptured today?

marcotc avatar Oct 15 '21 17:10 marcotc

Hi @marcotc, I'm new to sidekiq so I cannot answer but a DataDog support ticket I had filed led me to believe that native sidekiq log correlation is not currently available in DD. What I'm missing at the moment is any traceID within our sidekiq job logs.

icelynjennings avatar Nov 05 '21 11:11 icelynjennings

For some more context: what information are we missing today because of the lack of trace-logs correlation at the job logger level?

Is it only these three ("start", "done", "fail") log lines: https://github.com/mperham/sidekiq/blob/1f05bd60102419d12733afa267d96524ece8e756/lib/sidekiq/job_logger.rb#L11-L20. Or is there more rich data that is going uncaptured today?

Just my personal experience: I've got a lot of work done within workers, multiple queues, multiple workers per queue, multiple threads per worker, but the only context I get in DD is that it's Worker.x.

We really need to be able to isolate what each individual thread in each worker in each queue is doing over time.

"Filter logs by thread x on worker y in queue z" is the real function I need.

highwaybobbery avatar Jan 14 '22 18:01 highwaybobbery