dd-trace-rb icon indicating copy to clipboard operation
dd-trace-rb copied to clipboard

[BUG]: unable to kill all the SystemStackError introduced by installing the gem

Open bf4 opened this issue 3 months ago • 4 comments

Tracer Version(s)

2.20.0

Ruby Version(s)

C ruby 3.3.6 (2024-11-05 revision 75015d4c1f)

Relevent Library and Version(s)

multiple

Bug Report

I'm creating a new issue since https://github.com/DataDog/dd-trace-rb/issues/2348 while still open is 3 years old and reads like it's rack-mini-profiler specific

Since adding gem "datadog", require: "datadog/auto_instrument" to our Gemfile our application began, for the first time, regularly experiencing SystemStackError both in production and in development.

In response, we added DD_TRACE_PG_ENABLED=false DD_TRACE_HTTP_ENABLED=false to our spec helper since we couldn't get certain tests to run otherwise without it.

And we removed

  • rack-mini-profiler (we tried both the require: ["prepend_pg_patch", "prepend_net_http_patch"]-- this was only ever in the development group
  • scout_apm
  • memory_profiler
  • stackprof

And changed

  • sniffer to require: ["all_prepend"] per https://github.com/palkan/isolator/issues/44 and https://github.com/aderyabin/sniffer/issues/64

And we're still experiencing some methods which trigger a SystemStackError from the datadog code such as

https://github.com/DataDog/dd-trace-rb/blob/v2.20.0/lib/datadog/tracing/span_operation.rb#L114-L117

gems/datadog-2.20.0/lib/datadog/tracing/span_operation.rb:120:in `name='

This is pretty frustrating that there's all this advice to change how we require other gems, when the one common factor for the SystemStackError we have is the datadog gem

We talked to our DataDog repo and created an internal ticket as well. I'll be sharing this GitHub issue with them.

Reproduction Code

No response

Configuration Block

No response

Error Logs

No response

Operating System

No response

How does Datadog help you?

No response

bf4 avatar Sep 11 '25 22:09 bf4

@bf4 does the issue also occur if you change require: ["all_prepend"] to require: ["all_prepend", "sniffer"]?

y9v avatar Sep 12 '25 13:09 y9v

I think you may misunderstood the intent of this bug report- the root cause of the exception is installing this gem, not the availability of the various require options for other gems. And yes, sniffer is not a problem when we adjust how it loads--but that's in response to having a working application throwing errors when the data dog gem is installed

bf4 avatar Sep 12 '25 16:09 bf4

@bf4 I have not run into the infinite recursion issue you are describing here but, investigating the background of this ticket and https://github.com/DataDog/dd-trace-rb/issues/2348, https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#stack-level-too-deep pretty clearly states that:

  1. observability libraries patch other libraries
  2. there are two ways to patch: Module.prepend and alias_method
  3. Module.prepend permits multiple libraries to patch one target, alias_method generally does not
  4. dd-trace-rb uses Module.prepend

The issue happens when observability libraries use alias_method to patch and there are 2 or more libraries trying to patch the same target.

Module.prepend has been added in Ruby 2.0. Currently dd-trace-rb requires at least Ruby 2.5. So Ruby 2.0 is ancient and there is no reason not to use Module.prepend. Any library that still uses (by default) alias_method should be updated to use Module.prepend instead to not produce the infinite recursion you are describing. Are these libraries maintained?

dd-trace-rb already uses Module.prepend. What are you suggesting we do?

I sympathize that you potentially need to configure or maybe make code changes to some of the other libraries you are using, but I don't see how we can improve the situation from the dd-trace-rb side.

p-datadog avatar Sep 22 '25 17:09 p-datadog

I remember a few years ago there was some discussion that Module.prepend was worse for performance than alias_method, and that many libraries would prefer to continue using alias_method for performance reasons. Is that something Datadog has looked into at all?

nightpool avatar Sep 27 '25 08:09 nightpool