debug icon indicating copy to clipboard operation
debug copied to clipboard

Performance issue when stepping through a deep callstack

Open marianosimone opened this issue 3 years ago • 4 comments

Your environment

  • ruby -v: 2.7.2p137
  • rdbg -v: 1.6.2

Describe the bug When stepping through with a deep callstack, the debugger becomes slower with each step.

To Reproduce In a codebase which has a method that's ~150 calls deep in the stack, add a breakpoint, debug, and step through.

Expected behavior It wouldn't be slow

Additional context We are facing this issue at Stripe, and we tracked down the problem to: https://github.com/ruby/debug/blob/296daa6742e114395a4ce336b8721d48a6e5e52f/ext/debug/debug.c#L92-L98

This seems to be generating the full backtrace through rb_make_backtrace, just to then take the size. rb_make_backtrace (source) generates an array of strings each time it's called. Those strings end up not being used, and after a couple of iterations, you end up with a big memory footprint, which causes the GC to collect them, significantly slowing down the debugger.

Luckily, it looks like you are already aware that this is inefficient (given the // TODO: more efficient API) :)

Looking into alternatives, I came up with a couple of options (but open to any other ideas):

  1. Use rb_threadptr_backtrace_object to skip the formatting of the backtrace, saving some extra memory (but still generating more than necessary)
  2. debase solves this with a different approach: https://github.com/ruby-debug/debase/blob/5780803819f16190ce50ddfdad5775acf8f95fd1/ext/hacks.h#L26

Both of them, though, require the inclusion of *_core.h, which is not currently possible (debase solves it by using its own version of ruby_core_source)

marianosimone avatar Sep 22 '22 23:09 marianosimone

FYI, I think @WillHalto is already addressing this in #743 and #746

st0012 avatar Sep 22 '22 23:09 st0012

Yeah this issue makes stepping and finishing quite slow for large callstacks.

I think those ^ PRs should help quite a bit, with both changes together I measured a ~99% improvement in next performance https://github.com/ruby/debug/pull/743#issuecomment-1250131651 for an example case. Similar improvement for our internal use case in a large rails monolith.

However one thing to note is that the optimization in https://github.com/ruby/debug/pull/746 won't be applicable for ruby 2.7, as rb_profile_frames will skip C frames for ruby < 3.0 so I don't think we can use it for all versions.

WillHalto avatar Sep 23 '22 00:09 WillHalto

Unfortunately, #473 doesn't seem to have a significant impact in the examples I've tried. It does reduce time/memory consumption, but if the callstack is mostly small methods (so the ration of :return to :line high), it doesn't make a difference.

Given that #746 is for ruby > 3.0, we can't take advantage of it, either.

edit: That being said, even if it doesn't make a big impact in the cases I've tried, I think these optimizations are great and DO cut down on a lot of unnecessary work :)

marianosimone avatar Sep 23 '22 23:09 marianosimone

Gotcha, yeah #743 makes the most difference when the ratio of :line to :return is flipped.

WillHalto avatar Sep 26 '22 22:09 WillHalto