dd-trace-rb ActiveJob integration ignoring service

I have this setup:

  config.tracing.instrument :rails, service_name: "rails-app", cache_service: "rails-cache"
  config.tracing.instrument :active_job, service_name: "active_job"

However, active job traces end up in the rails-app service. I expected them in a dedicated service called active_job.

This is using ddtrace 1.1.0.

Jun 14 '22 10:06 jorgemanrubia

This workaround worked:

module Datadog::Tracing::Contrib::Rails::Framework
  def self.activate_active_job!(datadog_config, rails_config)
  end
end

And then this worked:

config.tracing.instrument :active_job, service_name: "active_job"

Jun 15 '22 10:06 jorgemanrubia

Before I address the behavior in this integration, first a little background on service_name changes in 1.0...

With the upgrade from 0.x --> 1.0, we've changed the behavior of service naming. Where as service was often used to approximate a package before, now we intend service to exclusively reflect the name of the application from which the span originated. This means every span generated by a Ruby application should reflect the configured c.service/DD_SERVICE name.

In an effort to provide groupings & visualizations in which service was previously used, we've added new tags to each integration: component maps to package name (e.g. active_job) and peer.service maps to the user-defined name for the external service (e.g. billing-db). The hope is these tags will fulfill the need for these categorical comparisons while allowing us to reclaim service and make its use consistent.

In keeping with this direction, although we expect to support service_name as an option for integrations which interact directly with external services (e.g. DBs, caches, HTTP etc), this value should now map onto peer.service instead of service for spans that wrap these external calls. In the case of internal spans (e.g. active_record.instantiation), the service should always reflect the application name (c.service/DD_SERVICE) and should not otherwise be configurable.

Jun 15 '22 18:06 delner

If I understand it correctly, with ActiveJob, all the spans occur within the Rails process that enqueues the jobs. However, active_job itself neither connects to the datastore, nor does it actually run the jobs. Instead it's merely an adapter for queuing jobs. (It will use the target libraries to do that work, e.g. Sidekiq, Resque, etc.)

(Please correct me if I'm wrong on any of these points.)

This is what makes ActiveJob is a little tricky. I'm of the mind that these spans shouldn't take or reflect a service_name, because they're "internal", and there should be instrumented libraries/drivers underneath that actually reflect the external operation (e.g. Redis, etc.)

However, if ActiveJob connects directly to datastores (without the use of another instrumented library/driver), or executes the jobs in its own thread/process, then maybe it should continue to support a service_name option.

This is a bit an oversight on our part, and I think we need to correct/clarify this behavior accordingly (meaning remove the service_name option). As such, this is kind of a "won't fix". However, if for some reason you believe we're missing a key feature or use case by doing so, I'd love to hear more about it so we can make sure we're not creating gaps by doing so.

Jun 15 '22 18:06 delner

Thanks for the clarifications on the names rationale @delner, I think that helps, and changes a bit the mental model I had 👍. I think a little problem right now is that the UI doesn't expose the "component" bit. I see it's a facet you can filter by, but it's not as visible as service.

I see active_job similar to active_record, that you can use with mysql or postgresql as the underlying databases; or to active_support cache, that you can configure with redis or with memcache. Datadog lets you name the service for those frameworks, I don't see how Active Job is different, conceptually speaking, to database access or cache access.

In our case, for some reason, resque APM instrumentation was tracing only a fraction of the jobs (like 1/100 of them). While with active_job it's tracing most of them, so we are going with active_job for now. A good thing about ActiveJob is that internally it's implemented using official Rails' instrumentation hooks, while Resque is monkey-patch-based. Another problem we have found with both is in relation to not instrumenting error states properly.

Jun 15 '22 20:06 jorgemanrubia

I think a little problem right now is that the UI doesn't expose the "component" bit.

Then this is something we should fix on our end. I'm currently advocating for this change; any suggestions you as to how you would expect it to be made visible/accessible in the UI is welcome. It's helpful for writing user stories, and framing the intent more clearly.

I see active_job similar to active_record

Fair point. Consistency between these and active_support caching seems logical.

That considered, I think what makes sense is to restore feature parity first: all three of these libs should respect service_name and update service... for now. Thus we should fix this. But as soon as we fill the gap on the UI (regarding component views) then we will flip all of them over to the aforementioned peer.service tag instead.

Regarding resque, something does seem off (from the reports we're seeing): this is a suspected bug in 1.x. I want to address that via a different issue though. As for this one, I think we will consider this resolved when we address the non-functioning service_name option is fixed.

Does that sound good?

Jun 16 '22 03:06 delner

Sounds perfect @delner. Thanks!

Jun 16 '22 05:06 jorgemanrubia

dd-trace-rb
dd-trace-rb copied to clipboard

ActiveJob integration ignoring service_name

dd-trace-rb dd-trace-rb copied to clipboard

ActiveJob integration ignoring service_name

dd-trace-rb
dd-trace-rb copied to clipboard