dd-trace-rb
dd-trace-rb copied to clipboard
ActiveJob integration ignoring service_name
I have this setup:
config.tracing.instrument :rails, service_name: "rails-app", cache_service: "rails-cache"
config.tracing.instrument :active_job, service_name: "active_job"
However, active job traces end up in the rails-app
service. I expected them in a dedicated service called active_job
.
This is using ddtrace
1.1.0.
This workaround worked:
module Datadog::Tracing::Contrib::Rails::Framework
def self.activate_active_job!(datadog_config, rails_config)
end
end
And then this worked:
config.tracing.instrument :active_job, service_name: "active_job"
Before I address the behavior in this integration, first a little background on service_name
changes in 1.0...
With the upgrade from 0.x --> 1.0, we've changed the behavior of service
naming. Where as service
was often used to approximate a package
before, now we intend service
to exclusively reflect the name of the application from which the span originated. This means every span generated by a Ruby application should reflect the configured c.service
/DD_SERVICE
name.
In an effort to provide groupings & visualizations in which service
was previously used, we've added new tags to each integration: component
maps to package name (e.g. active_job
) and peer.service
maps to the user-defined name for the external service (e.g. billing-db
). The hope is these tags will fulfill the need for these categorical comparisons while allowing us to reclaim service
and make its use consistent.
In keeping with this direction, although we expect to support service_name
as an option for integrations which interact directly with external services (e.g. DBs, caches, HTTP etc), this value should now map onto peer.service
instead of service
for spans that wrap these external calls. In the case of internal spans (e.g. active_record.instantiation
), the service
should always reflect the application name (c.service
/DD_SERVICE
) and should not otherwise be configurable.
If I understand it correctly, with ActiveJob, all the spans occur within the Rails process that enqueues the jobs. However, active_job
itself neither connects to the datastore, nor does it actually run the jobs. Instead it's merely an adapter for queuing jobs. (It will use the target libraries to do that work, e.g. Sidekiq, Resque, etc.)
(Please correct me if I'm wrong on any of these points.)
This is what makes ActiveJob is a little tricky. I'm of the mind that these spans shouldn't take or reflect a service_name
, because they're "internal", and there should be instrumented libraries/drivers underneath that actually reflect the external operation (e.g. Redis, etc.)
However, if ActiveJob connects directly to datastores (without the use of another instrumented library/driver), or executes the jobs in its own thread/process, then maybe it should continue to support a service_name
option.
This is a bit an oversight on our part, and I think we need to correct/clarify this behavior accordingly (meaning remove the service_name
option). As such, this is kind of a "won't fix". However, if for some reason you believe we're missing a key feature or use case by doing so, I'd love to hear more about it so we can make sure we're not creating gaps by doing so.
Thanks for the clarifications on the names rationale @delner, I think that helps, and changes a bit the mental model I had 👍. I think a little problem right now is that the UI doesn't expose the "component" bit. I see it's a facet you can filter by, but it's not as visible as service.
I see active_job
similar to active_record
, that you can use with mysql
or postgresql
as the underlying databases; or to active_support
cache, that you can configure with redis
or with memcache
. Datadog lets you name the service for those frameworks, I don't see how Active Job is different, conceptually speaking, to database access or cache access.
In our case, for some reason, resque
APM instrumentation was tracing only a fraction of the jobs (like 1/100 of them). While with active_job
it's tracing most of them, so we are going with active_job
for now. A good thing about ActiveJob is that internally it's implemented using official Rails' instrumentation hooks, while Resque is monkey-patch-based. Another problem we have found with both is in relation to not instrumenting error states properly.
I think a little problem right now is that the UI doesn't expose the "component" bit.
Then this is something we should fix on our end. I'm currently advocating for this change; any suggestions you as to how you would expect it to be made visible/accessible in the UI is welcome. It's helpful for writing user stories, and framing the intent more clearly.
I see
active_job
similar toactive_record
Fair point. Consistency between these and active_support
caching seems logical.
That considered, I think what makes sense is to restore feature parity first: all three of these libs should respect service_name
and update service
... for now. Thus we should fix this. But as soon as we fill the gap on the UI (regarding component
views) then we will flip all of them over to the aforementioned peer.service
tag instead.
Regarding resque
, something does seem off (from the reports we're seeing): this is a suspected bug in 1.x. I want to address that via a different issue though. As for this one, I think we will consider this resolved when we address the non-functioning service_name
option is fixed.
Does that sound good?
Sounds perfect @delner. Thanks!