debug
debug copied to clipboard
Performance issue when stepping through a deep callstack
Your environment
ruby -v: 2.7.2p137rdbg -v: 1.6.2
Describe the bug When stepping through with a deep callstack, the debugger becomes slower with each step.
To Reproduce In a codebase which has a method that's ~150 calls deep in the stack, add a breakpoint, debug, and step through.
Expected behavior It wouldn't be slow
Additional context We are facing this issue at Stripe, and we tracked down the problem to: https://github.com/ruby/debug/blob/296daa6742e114395a4ce336b8721d48a6e5e52f/ext/debug/debug.c#L92-L98
This seems to be generating the full backtrace through rb_make_backtrace, just to then take the size. rb_make_backtrace (source) generates an array of strings each time it's called. Those strings end up not being used, and after a couple of iterations, you end up with a big memory footprint, which causes the GC to collect them, significantly slowing down the debugger.
Luckily, it looks like you are already aware that this is inefficient (given the // TODO: more efficient API) :)
Looking into alternatives, I came up with a couple of options (but open to any other ideas):
- Use
rb_threadptr_backtrace_objectto skip the formatting of the backtrace, saving some extra memory (but still generating more than necessary) debasesolves this with a different approach: https://github.com/ruby-debug/debase/blob/5780803819f16190ce50ddfdad5775acf8f95fd1/ext/hacks.h#L26
Both of them, though, require the inclusion of *_core.h, which is not currently possible (debase solves it by using its own version of ruby_core_source)
FYI, I think @WillHalto is already addressing this in #743 and #746
Yeah this issue makes stepping and finishing quite slow for large callstacks.
I think those ^ PRs should help quite a bit, with both changes together I measured a ~99% improvement in next performance https://github.com/ruby/debug/pull/743#issuecomment-1250131651 for an example case. Similar improvement for our internal use case in a large rails monolith.
However one thing to note is that the optimization in https://github.com/ruby/debug/pull/746 won't be applicable for ruby 2.7, as rb_profile_frames will skip C frames for ruby < 3.0 so I don't think we can use it for all versions.
Unfortunately, #473 doesn't seem to have a significant impact in the examples I've tried. It does reduce time/memory consumption, but if the callstack is mostly small methods (so the ration of :return to :line high), it doesn't make a difference.
Given that #746 is for ruby > 3.0, we can't take advantage of it, either.
edit: That being said, even if it doesn't make a big impact in the cases I've tried, I think these optimizations are great and DO cut down on a lot of unnecessary work :)
Gotcha, yeah #743 makes the most difference when the ratio of :line to :return is flipped.