debug Faster `next` by reducing number of frame

Description

Background

next can be quite slow when stepping over code. It appears this is a known issue due to expensive calls to DEBUGGER__.frame_depth that are made for every :line event when stepping.
- This check is made to determine if we have returned to the same stack depth as when next was called, since that's where we want to break.
This performance penalty is significant when stepping over code in a large codebase with a large call stack.
This PR improves the performance of stepping with next by restricting when frame_depth is called for :line events.

What this PR does

By not checking frame_depth for every single :line event when stepping, we can save significant time.
We can do this based on the following assumptions:
1. If we execute a line of code (aka, encounter a :line event) without breaking or reaching the same/lesser stack depth than where 'next' was called, it means we are at a deeper stack depth.
2. Removing frames from the stack corresponds to a :return or :b_return event.
3. So, if we pass a :line event without reaching the starting stack depth we can ignore all following :line events until we see a return or b_return event, since we will not be back to the same stack depth until some return event(s) occur.

Examples/validation

This file uses deep recursion with a method call to create a very large call stack:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

We can benchmark a step over the call on line 13 with this command:

time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb

Results

Baseline - not stepping at all:

 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

Stepping before this change:

 $ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.188s
user    0m2.083s
sys     0m0.054s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

So this change gives a roughly ~50% improvement in the performance of stepping for the example.

Co-authored-by: @jhawthorn

Aug 31 '22 15:08 WillHalto

This is an amazing optimization 👍

I feel we can also improve this part's readability a bit by moving relevant logic closer:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"

And then we can further refactor the later part too:

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

So putting them all together:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

Aug 31 '22 20:08 st0012

@st0012 thanks for the feedback! That's a great refactor, I did that in https://github.com/ruby/debug/pull/743/commits/7a414919c42e1a9cc3854071cc2488e945bedcb3 👍

Sep 01 '22 20:09 WillHalto

Great work in these optimisations Will 😃 Are you thinking about optimising startup/connection times too? I'm thinking about that. Haven't looked too deeply but I'm interested in a minimal config that cuts out as many 'requires' as possible. It takes the best part of 10 seconds to get debugging on my underpowered pc with a pretty simple ruby program. Feels like it should be a couple of seconds really and practically instant on a good pc.

Sep 05 '22 13:09 jabamaus

Could you measure the time with https://github.com/ruby/debug/pull/746 ?

Sep 16 '22 09:09 ko1

Could you measure the time with #746 ?

@ko1 yes, here is that result:

Using the same example as above

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

Baseline - not stepping at all:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.252s
user    0m0.207s
sys     0m0.046s

Stepping before changes:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.370s
user    0m2.292s
sys     0m0.077s

Stepping with changes from https://github.com/ruby/debug/pull/743 and https://github.com/ruby/debug/pull/746

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

With both of these changes in place, there is a ~99% improvement in the next performance compared to the baseline.

Sep 17 '22 19:09 WillHalto

Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

Oct 04 '22 07:10 ko1

Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

@ko1 oh, I understand. Test result for #746 without #743 is in the PR description here: https://github.com/ruby/debug/pull/746#issue-1359369163 Under the Examples/validation header.

Let me know if you'd like me to run any other tests on either of these PRs! 👍

Oct 11 '22 19:10 WillHalto

Ah, thank you!

This is summary:

# baseline
 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

# only with #743 
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

# only with #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.285s
user    0m0.218s
sys     0m0.069s

# with #743 and #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

It seems #746 is dominant, so I'll merge only #746.

#743 (this PR) I can't fully guarantee it is correct.

after reading the code, your code is more readable and ... maybe no problem, so I'll merge it.

Oct 26 '22 08:10 ko1

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

Oct 26 '22 08:10 ko1

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

Ah, no problem, I agree with

It seems #746 is dominant

anyway. Is it worth trying to fix multi thread support here or just go ahead with #746 only instead?

Oct 26 '22 14:10 WillHalto

Is it worth trying to fix multi thread support here or just go ahead with https://github.com/ruby/debug/pull/746 only instead?

I think multi-threading support is not valuable than the advantages.

Oct 28 '22 02:10 ko1

My comment "it doesn't support multi-threads" was wrong (because TracePoint only enables on the created thread) so I'll try to merge it again.

Nov 02 '22 17:11 ko1

Faster `next` by reducing number of frame_depth calls

Description

Background

What this PR does

Examples/validation

Results