dd-trace-rb
dd-trace-rb copied to clipboard
Propagate trace through Concurrent::Promises.future
Argument passing is preserved through the composited executor.
Concurrent::Future
has been moved to a prepend style to resolve an infinite call loop situation.
Duh, one lousy test on my part. Will fix.
avoid (potential?) infinite recursion
That triggered during a rspec run, so unless the behaviour is spec-induced it had a chance of happening for real.
Having all these changes together made it quite hard for me to review the PR, as I had to separate them out for myself to understand why each change was there and why it was needed.
It may not be worth doing the separation after the fact for this PR (up to you), but I'd really like to see such a separation in the future, as it would make non-trivial changes such as this one a lot easier to review.
I agree. It turns out the order of discovering the issues and corresponding fixes was basically reversed from what would be a sensible sequence of changes for review, and by that point it wasn't quite straightforward to retcon these into non-broken individual commits, which I feel sorry for not doing.
I tried to make the code comments relevant, but I could definitely have been more explicit in the commit message/PR description though.
That triggered during a rspec run, so unless the behaviour is spec-induced it had a chance of happening for real.
I'd be curious to see what happened -- was the patch applied more than once? Just to make sure we're fixing it properly with the prepend version.
I agree. It turns out the order of discovering the issues and corresponding fixes was basically reversed from what would be a sensible sequence of changes for review, and by that point it wasn't quite straightforward to retcon these into non-broken individual commits, which I feel sorry for not doing.
No worries, there's always the ideal and then there's reality. Hope you don't mind if I poke when it made my life harder, and please do the same and get revenge on my PRs as well :)
I'd be curious to see what happened -- was the patch applied more than once? Just to make sure we're fixing it properly with the prepend version.
Yup, now that you mention it, the after
step wasn't cleaning up correctly (now fixed), so the replaced method was the same as the replacing, and it looped. It could also happen in the now classic case of someone else using prepend
on the same method, or a variety of other situations involving inheritance, for which the solution would be to use prepend
through and through anyway.
Hope you don't mind if I poke when it made my life harder
No offense taken! It was definitely called for :)
The JRuby 9.2.0.0 test failure looks like it is deterministic -- https://app.circleci.com/pipelines/github/DataDog/dd-trace-rb/3520/workflows/c571f62b-cf7f-44ed-bd05-2d134b54a8a8/jobs/138556/tests#failed-test-0 .
We have 9.2.0.0 because of #1409 but to be honest I'm not entirely a fan. At most, I'd suggest just skipping support for whatever is broken on those JRuby versions, and printing out a warning "did not patch concurrent ruby... your JRuby version is too old, please update".
Regarding the lint failure, rebasing with master will fix it now.
Rebased!
Can you look into https://github.com/DataDog/dd-trace-rb/pull/1522#issuecomment-849703453 as well?
Can you look into #1522 (comment) as well?
I was just typing it out ;)
I'm not entirely a fan
I wasn't aware of this PR, and I'm not either, not the least because I had to face non-conformant breakage already on JRuby. TBH I'd rather officially support only latest for each minor, given the regular stream of "oops we were not quite conformant on that not-so-corner-case" fixes that regularly land at JRuby. Not my call though.
At most, I'd suggest just skipping support for whatever is broken on those JRuby versions, and printing out a warning "did not patch concurrent ruby... your JRuby version is too old, please update".
Is there a sanctioned DSL-ish way to do this, or is this kind of thing made of hardcoded checks?
I wasn't aware of this PR, and I'm not either, not the least because I had to face non-conformant breakage already on JRuby. TBH I'd rather officially support only latest for each minor, given the regular stream of "oops we were not quite conformant on that not-so-corner-case" fixes that regularly land at JRuby. Not my call though.
Pfft I'll see your issue and raise you this one 🤣
For real though, since this looks like one of those "it breaks your app" kind of deal, I think it's worth doing the small detour, since people are known to get stuck on random versions and not update them.
(But I also don't think we should spend too much time searching for new ones -- e.g. let's not add all minor versions of JRuby and MRI to CI)
Is there a sanctioned DSL-ish way to do this, or is this kind of thing made of hardcoded checks?
#1503 had a few examples that got removed recently. My specific suggestion would be to: find out which JRuby version fixed it (binary search 9.2 releases?) and then do something like
(in concurrent_ruby/integration.rb)
def self.compatible?
super && version >= MINIMUM_VERSION && !broken_jruby?
end
Thanks Ivo, I'll look into these examples and attempt something.
@lloeki any update on this one?
We faced with the same issue. We use the graphql-ruby with a new usage of Concurrent::Promises.future
and our spans are not showing in a span list.
We have added a monkey patch into our project.
- ruby 3.2.2
- ddtrace (1.14.0)
- concurrent-ruby (1.2.2)
- concurrent-ruby-ext (1.2.2)
# frozen_string_literal: true
require 'datadog/tracing/contrib/concurrent_ruby/configuration/settings'
require 'concurrent/executor/executor_service'
module Datadog
module Contrib
module ConcurrentRuby
# This patches the Future - to wrap executor service using ContextCompositeExecutorService
class ContextCompositeExecutorService
extend Forwardable
include Concurrent::ExecutorService
attr_accessor :composited_executor
def initialize(composited_executor)
@composited_executor = composited_executor
end
def post(*, &block)
tracer = Tracing.send(:tracer)
parent_context = tracer.provider.context
executor = @composited_executor.is_a?(Symbol) ? Concurrent.executor(@composited_executor) : @composited_executor
# Pass the original arguments to the composited executor, which
# pushes them (possibly transformed) as block args
executor.post(*) do |*block_args|
begin
original_context = tracer.provider.context
tracer.provider.context = parent_context
# Pass the executor-provided block args as they should have been
# originally passed without composition, see ChainPromise#on_resolvable
yield(*block_args)
ensure
# Restore context in case the current thread gets reused
tracer.provider.context = original_context
end
end
end
delegate [:can_overflow?, :serialized?] => :composited_executor
end
module PromisesFuturePatch
def future_on(default_executor, *args, &task)
unless default_executor.is_a?(ContextCompositeExecutorService)
default_executor = ContextCompositeExecutorService.new(default_executor)
end
super(default_executor, *args, &task)
end
end
class Integration
include Datadog::Tracing::Contrib::Integration
register_as :concurrent_ruby
def self.version
Gem.loaded_specs['concurrent-ruby'] && Gem.loaded_specs['concurrent-ruby'].version
end
def self.loaded?
# Concurrent::Future is deprecated in favour of Concurrent::Promises::Future
!defined?(::Concurrent::Promises::Future).nil? || !defined?(::Concurrent::Future).nil?
end
def self.compatible?
true
end
def default_configuration
Datadog::Tracing::Contrib::ConcurrentRuby::Configuration::Settings.new
end
def patcher
Patcher
end
end
module FuturePatch
def ns_initialize(value, opts)
super(value, opts)
@executor = ContextCompositeExecutorService.new(@executor)
end
end
module Patcher
include Datadog::Tracing::Contrib::Patcher
module_function
def target_version
Integration.version
end
def patch
patch_future
patch_promises_future
end
# Propagate tracing context in Concurrent::Future
def patch_future
::Concurrent::Future.prepend(FuturePatch) if defined?(::Concurrent::Future)
end
# Propagate tracing context in Concurrent::Promises::Future
def patch_promises_future
::Concurrent::Promises.singleton_class.prepend(PromisesFuturePatch) if defined?(::Concurrent::Promises::Future)
end
end
end
end
end
Works ok (we can see all spans): Example how to test with graphql-ruby:
# created a Promise.future
Concurrent::Promises.future do
# ex. faraday call
end
# define schema lazy_resolve for Future
lazy_resolve(Concurrent::Promises::Future, :value!)
@ivoanjo could you apply this monkey patch to the official release ? We tested on a project for a while and it works nice.
@Stashchenko do you mind opening an issue for it? I can use it to chase down a few people to look at it :)
@Stashchenko do you mind opening an issue for it? I can use it to chase down a few people to look at it :)
We have already issue https://github.com/DataDog/dd-trace-rb/issues/1316
I've requested the team take another look at this! Hopefully they'll be able to reply soon :)
The JRuby 9.2.0.0 test failure looks like it is deterministic -- https://app.circleci.com/pipelines/github/DataDg/dd-trace-rb/3520/workflows/c571f62b-cf7f-44ed-bd05-2d134b54a8a8/jobs/138556/tests#failed-test-0 .
We have 9.2.0.0 because of #1409 but to be honest I'm not entirely a fan. At most, I'd suggest just skipping support for whatever is broken on those JRuby versions, and printing out a warning "did not patch concurrent ruby... your JRuby version is too old, please update".
@ivoanjo @lloeki JRuby CI is currently passing (JRuby 9.2.0 is no longer tested), I think this is good to merge?
@ekump LGTM, thanks for picking it up!
Hey @Stashchenko, this change has just been released in 1.16.0. Let us know if this addresses your issue!
Hey @Stashchenko, this change has just been released in 1.16.0. Let us know if this addresses your issue!
Just checked, Unfortunately, doesn't work :(
With monkey patch:
With new 1.16 release:
In the second screenshot there is not faraday trace :(
Our setup:
# frozen_string_literal: true
require 'datadog/statsd'
require 'ddtrace'
require 'datadog/appsec'
require 'faraday'
Datadog.configure do |c|
service_name = 'my-service'
c.service = service_name
c.tracing.analytics.enabled = true
c.tracing.instrument(:concurrent_ruby)
c.tracing.instrument(:rails)
c.tracing.instrument(:redis, service_name: "#{service_name}-redis")
c.tracing.instrument(:active_support, cache_service: service_name)
c.tracing.instrument(:active_record, service_name: "#{service_name}-postgres")
c.tracing.instrument(:http, split_by_domain: true)
c.tracing.instrument(:faraday, service_name:)
c.runtime_metrics.enabled = true
c.runtime_metrics.statsd = Datadog::Statsd.new('localhost', (ENV['STATSD_PORT'] || '8120').to_i)
c.ci.enabled = ENV['CI'].present?
c.ci.instrument :rspec
c.appsec.instrument :rails
end
I tried to experiment with our monkey patch, and what I've found: I've made only changes to the ContextCompositeExecutorService#post
In initialize folder: ddtrace_concurrent_ruby_patch.rb
# frozen_string_literal: true
require 'concurrent/executor/executor_service'
module Datadog
module Tracing
module Contrib
module ConcurrentRuby
# This patches the Future - to wrap executor service using ContextCompositeExecutorService
class ContextCompositeExecutorService
extend Forwardable
include Concurrent::ExecutorService
attr_accessor :composited_executor
def initialize(composited_executor)
@composited_executor = composited_executor
end
# The post method runs the task within composited executor - in a
# different thread. The original arguments are captured to be
# propagated to the composited executor post method
def post(*)
tracer = Tracing.send(:tracer)
parent_context = tracer.provider.context
executor = @composited_executor
executor = Concurrent.executor(@composited_executor) if @composited_executor.is_a?(Symbol)
# Pass the original arguments to the composited executor, which
# pushes them (possibly transformed) as block args
executor.post(*) do |*block_args|
original_context = tracer.provider.context
tracer.provider.context = parent_context
# Pass the executor-provided block args as they should have been
# originally passed without composition, see ChainPromise#on_resolvable
yield(*block_args)
ensure
# Restore context in case the current thread gets reused
tracer.provider.context = original_context
end
end
delegate %i[can_overflow? serialized?] => :composited_executor
end
end
end
end
end
And it started working:
Without patch (not showing):
Looks like the issue for us is somewhere in the def post(*)
method when we get the tracer. I'm not aware how it works yet maybe it will be helpful for you. Otherwise, we have somewhere incompatible versions
👋 @Stashchenko, with the 1.16 change, you can see that there's a "floating" block of tracing (the one with a big execute.graphql
on the bottom right). This is because the instrumentation of Future
is now thread safe.
With new 1.16 release:
The original patch, where tracer.provider.context
is manually passed to another concurrent-ruby
context is not thread-safe. It's not possible for two threads to modify the same tracer.provider.context
in a sensible way. We added a test that demonstrate the issues caused by sharing a tracer.provider.context
amongst different Future
s (you get arbitrary threads creating spans that parent another thread's spans): https://github.com/DataDog/dd-trace-rb/pull/1522/files#diff-c44ebabaf45a6be710afd7ccb798cb12e505e0f0cbd30293c980b52cd5c12121R105-R136
The new implementation actually represents the accurate trace flow in a parallel application, and won't cause span parenting issues.
In the second screenshot there is not faraday trace :(
Now this is unexpected, given this change has nothing to do with Faraday. Could you let us know how Faraday is related to the rest of the trace you provided? It's very intriguing that a separate integration stopped working with this change.
👋 @Stashchenko, I was not able to reproduce your case.
Could you provide me something more to further investigate with the missing faraday span
The snippet I used to mimic your behavior:
require 'ddtrace'
require 'graphql'
require 'faraday'
require 'concurrent-ruby'
require 'ostruct'
require 'pry'
class GraphQL::Types::BaseObject < GraphQL::Schema::Object
end
class PersonType < GraphQL::Types::BaseObject
field :name, String, null: false
end
class PostType < GraphQL::Types::BaseObject
field :id, ID, null: false
field :author, PersonType
def author
Concurrent::Promises.future do
Faraday.get 'https://www.google.com'
OpenStruct.new(id: object.author_id, name: object.author_id)
end.value
end
end
class QueryType < GraphQL::Schema::Object
field :post, PostType do
argument :id, ID
end
# Then provide an implementation:
def post(id:)
OpenStruct.new(id: id, author_id: id)
end
end
class Schema < GraphQL::Schema
query QueryType
# define schema lazy_resolve for Future
lazy_resolve(Concurrent::Promises::Future, :value)
end
Datadog.configure do |c|
c.service = 'Bug Reproduction'
c.diagnostics.debug = true
c.tracing.instrument :faraday
c.tracing.instrument :concurrent_ruby
c.tracing.instrument :graphql, schemas: [Schema]
end
query_string = <<-GraphqlQueryString
{
p1: post(id: 1) { author { name } }
p2: post(id: 2) { author { name } }
p3: post(id: 3) { author { name } }
}
GraphqlQueryString
result_hash = Datadog::Tracing.trace("outer_span") do
Schema.execute(query_string)
end
Hi @TonyCTHsu , you should not get the value just in the same method call.
Concurrent::Promises.future do
Faraday.get 'https://www.google.com'
OpenStruct.new(id: object.author_id, name: object.author_id)
end.value # This piece
It's probably, immediately runs the query and nothing to do for lazy resolver.
One more step to reproduce. Try to get list of products with author's name:
query_string = <<-GraphqlQueryString
{
posts { author { name } }
}
GraphqlQueryString
That what we have in our code.
One more step I don't know what exactly does this line Faraday.get 'https://www.google.com'
. Does it execute faraday get or wait on real method call? (aka proxy object).
Smt like that should call the Faraday
Concurrent::Promises.future do
response = Faraday.get 'https://api.publicapis.org/entries'
value = JSON.parse(response.body).sample
OpenStruct.new(id: object.author_id, name: value['API'])
end
Hi @marcotc ,
Could you let us know how Faraday is related to the rest of the trace you provided?
It's not Faraday related :( Basically, any span I would like to create inside the Promise.future is not showing in a trace for us :(
Hi guys , tried to reproduce @TonyCTHsu example on a clean project.
class PersonType < BaseObject
field :name, String, null: false
end
class PostType < BaseObject
field :author, PersonType
field :id, ID, null: false
def author
Concurrent::Promises.future do
Datadog::Tracing.trace('example.trace') do
response = Faraday.get 'https://api.publicapis.org/entries'
value = JSON.parse(response.body)['entries'].sample
OpenStruct.new(id: object.author_id, name: value['API'])
end
end
end
end
field :posts, [PostType], null: false
def posts
Array.new(10) { OpenStruct.new(id: SecureRandom.uuid, author_id: SecureRandom.uuid) }
end
And result is: 🔥
Looks like something incompatible inside our project. Will dig deeper to compare what's the diff. Sorry for the confusion. And thanks a lot for support!