dd-trace-rb
dd-trace-rb copied to clipboard
Trace garbage collections
This can be a big source of latency.
👍 Vitally important for larger applications. One of the main things missing since we switched over from NewRelic.
GC cannot be attributed to a single Ruby thread because it's a global VM event.
This means that we can attribute it to a specific trace: affects all traces active during a GC run.
This is not really a major issue, given the effect of GC will be felt across all active threads, so it's accurate to report GC runs in a trace: we are attributing the GC effect to a trace, not GC cause.
With that in mind, we can collect all GC information in Ruby today:
GC.stat(:minor_gc_count)
GC.stat(:major_gc_count)
GC.stat(:time) # Since Ruby 3.1
My recommendation is to add this information to the current trace
as a metric
.
The information should be collected as a difference, since *gc_count
and time
values are globally incremented.
It would be something like: trace.set_metric('ruby.gc.minor_count', minor_gc_after - minor_gc_before)
.
If the performance impact is acceptable, this can even be done on a per-span level: this would allow to pinpoint the exact span that was affected by GC. But the performance impact has to be strictly measured before choosing this approach.
I ran into this ticket again today, and actually we do have coverage for this now in the Datadog Ruby profiler.
Specifically, we call it GC profiling, and if you have the profiler enabled, you can enable it using the DD_PROFILING_FORCE_ENABLE_GC=true
env variable or via code using c.profiling.advanced.force_enable_gc_profiling = true
.
You'll be able to see how much time is spent in GC in general:
...and even for individual requests or Ruby processes using the timeline view:
So yeah, give it a try and let us know!
P.s: We've done some recent improvements to this feature so I recommend trying it on the latest ddtrace.
With that said, I'm going to go ahead and close this one! Feedback is very welcome, btw :D