debug icon indicating copy to clipboard operation
debug copied to clipboard

Faster `next` by reducing number of frame_depth calls

Open WillHalto opened this issue 3 years ago • 5 comments

Description

Background

  • next can be quite slow when stepping over code. It appears this is a known issue due to expensive calls to DEBUGGER__.frame_depth that are made for every :line event when stepping.
    • This check is made to determine if we have returned to the same stack depth as when next was called, since that's where we want to break.
  • This performance penalty is significant when stepping over code in a large codebase with a large call stack.
  • This PR improves the performance of stepping with next by restricting when frame_depth is called for :line events.

What this PR does

  • By not checking frame_depth for every single :line event when stepping, we can save significant time.
  • We can do this based on the following assumptions:
    1. If we execute a line of code (aka, encounter a :line event) without breaking or reaching the same/lesser stack depth than where 'next' was called, it means we are at a deeper stack depth.
    2. Removing frames from the stack corresponds to a :return or :b_return event.
    3. So, if we pass a :line event without reaching the starting stack depth we can ignore all following :line events until we see a return or b_return event, since we will not be back to the same stack depth until some return event(s) occur.

Examples/validation

This file uses deep recursion with a method call to create a very large call stack:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

We can benchmark a step over the call on line 13 with this command:

time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb

Results

Baseline - not stepping at all:

 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

Stepping before this change:

 $ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.188s
user    0m2.083s
sys     0m0.054s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

So this change gives a roughly ~50% improvement in the performance of stepping for the example.

Co-authored-by: @jhawthorn

WillHalto avatar Aug 31 '22 15:08 WillHalto

This is an amazing optimization 👍

I feel we can also improve this part's readability a bit by moving relevant logic closer:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"


And then we can further refactor the later part too:

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

So putting them all together:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

st0012 avatar Aug 31 '22 20:08 st0012

@st0012 thanks for the feedback! That's a great refactor, I did that in https://github.com/ruby/debug/pull/743/commits/7a414919c42e1a9cc3854071cc2488e945bedcb3 👍

WillHalto avatar Sep 01 '22 20:09 WillHalto

Great work in these optimisations Will 😃 Are you thinking about optimising startup/connection times too? I'm thinking about that. Haven't looked too deeply but I'm interested in a minimal config that cuts out as many 'requires' as possible. It takes the best part of 10 seconds to get debugging on my underpowered pc with a pretty simple ruby program. Feels like it should be a couple of seconds really and practically instant on a good pc.

jabamaus avatar Sep 05 '22 13:09 jabamaus

Could you measure the time with https://github.com/ruby/debug/pull/746 ?

ko1 avatar Sep 16 '22 09:09 ko1

Could you measure the time with #746 ?

@ko1 yes, here is that result:

Using the same example as above

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

Baseline - not stepping at all:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.252s
user    0m0.207s
sys     0m0.046s

Stepping before changes:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.370s
user    0m2.292s
sys     0m0.077s

Stepping with changes from https://github.com/ruby/debug/pull/743 and https://github.com/ruby/debug/pull/746

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

With both of these changes in place, there is a ~99% improvement in the next performance compared to the baseline.

WillHalto avatar Sep 17 '22 19:09 WillHalto

Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

ko1 avatar Oct 04 '22 07:10 ko1

Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

@ko1 oh, I understand. Test result for #746 without #743 is in the PR description here: https://github.com/ruby/debug/pull/746#issue-1359369163 Under the Examples/validation header.

Let me know if you'd like me to run any other tests on either of these PRs! 👍

WillHalto avatar Oct 11 '22 19:10 WillHalto

Ah, thank you!

This is summary:

# baseline
 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

# only with #743 
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

# only with #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.285s
user    0m0.218s
sys     0m0.069s

# with #743 and #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

It seems #746 is dominant, so I'll merge only #746.

#743 (this PR) I can't fully guarantee it is correct.


after reading the code, your code is more readable and ... maybe no problem, so I'll merge it.

ko1 avatar Oct 26 '22 08:10 ko1

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

ko1 avatar Oct 26 '22 08:10 ko1

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

Ah, no problem, I agree with

It seems #746 is dominant

anyway. Is it worth trying to fix multi thread support here or just go ahead with #746 only instead?

WillHalto avatar Oct 26 '22 14:10 WillHalto

Is it worth trying to fix multi thread support here or just go ahead with https://github.com/ruby/debug/pull/746 only instead?

I think multi-threading support is not valuable than the advantages.

ko1 avatar Oct 28 '22 02:10 ko1

My comment "it doesn't support multi-threads" was wrong (because TracePoint only enables on the created thread) so I'll try to merge it again.

ko1 avatar Nov 02 '22 17:11 ko1