Faster `next` by reducing number of frame_depth calls
Description
Background
-
nextcan be quite slow when stepping over code. It appears this is a known issue due to expensive calls to DEBUGGER__.frame_depth that are made for every :line event when stepping.- This check is made to determine if we have returned to the same stack depth as when
nextwas called, since that's where we want to break.
- This check is made to determine if we have returned to the same stack depth as when
- This performance penalty is significant when stepping over code in a large codebase with a large call stack.
- This PR improves the performance of stepping with
nextby restricting when frame_depth is called for:lineevents.
What this PR does
- By not checking frame_depth for every single
:lineevent when stepping, we can save significant time. - We can do this based on the following assumptions:
- If we execute a line of code (aka, encounter a
:lineevent) without breaking or reaching the same/lesser stack depth than where 'next' was called, it means we are at a deeper stack depth. - Removing frames from the stack corresponds to a
:returnor:b_returnevent. - So, if we pass a
:lineevent without reaching the starting stack depth we can ignore all following:lineevents until we see a return or b_return event, since we will not be back to the same stack depth until some return event(s) occur.
- If we execute a line of code (aka, encounter a
Examples/validation
This file uses deep recursion with a method call to create a very large call stack:
# target.rb
def foo
"hello"
end
def recursive(n,stop)
foo
return if n >= stop
recursive(n + 1, stop)
end
recursive(0,1000) # line 13
We can benchmark a step over the call on line 13 with this command:
time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
Results
Baseline - not stepping at all:
$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.241s
user 0m0.188s
sys 0m0.054s
Stepping before this change:
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m2.188s
user 0m2.083s
sys 0m0.054s
Stepping after this change:
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m1.147s
user 0m1.107s
sys 0m0.038s
So this change gives a roughly ~50% improvement in the performance of stepping for the example.
Co-authored-by: @jhawthorn
This is an amazing optimization 👍
I feel we can also improve this part's readability a bit by moving relevant logic closer:
- loc = caller_locations(2, 1).first
- loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
stack_depth = DEBUGGER__.frame_depth - 3
# If we're at a deeper stack depth, we can skip line events until there's a return event.
skip_line = event == :line && stack_depth > depth
# same stack depth
- (stack_depth <= depth) ||
+ next true if stack_depth <= depth
+
+ loc = caller_locations(2, 1).first
+ loc_path = loc.absolute_path || "!eval:#{loc.path}"
And then we can further refactor the later part too:
# different frame
- (next_line && loc_path == path &&
- (loc_lineno = loc.lineno) > line &&
- loc_lineno <= next_line)
+ next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)
So putting them all together:
- loc = caller_locations(2, 1).first
- loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
stack_depth = DEBUGGER__.frame_depth - 3
# If we're at a deeper stack depth, we can skip line events until there's a return event.
skip_line = event == :line && stack_depth > depth
# same stack depth
- (stack_depth <= depth) ||
+ next true if stack_depth <= depth
+
+ loc = caller_locations(2, 1).first
+ loc_path = loc.absolute_path || "!eval:#{loc.path}"
# different frame
- (next_line && loc_path == path &&
- (loc_lineno = loc.lineno) > line &&
- loc_lineno <= next_line)
+ next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)
@st0012 thanks for the feedback! That's a great refactor, I did that in https://github.com/ruby/debug/pull/743/commits/7a414919c42e1a9cc3854071cc2488e945bedcb3 👍
Great work in these optimisations Will 😃 Are you thinking about optimising startup/connection times too? I'm thinking about that. Haven't looked too deeply but I'm interested in a minimal config that cuts out as many 'requires' as possible. It takes the best part of 10 seconds to get debugging on my underpowered pc with a pretty simple ruby program. Feels like it should be a couple of seconds really and practically instant on a good pc.
Could you measure the time with https://github.com/ruby/debug/pull/746 ?
Could you measure the time with #746 ?
@ko1 yes, here is that result:
Using the same example as above
# target.rb
def foo
"hello"
end
def recursive(n,stop)
foo
return if n >= stop
recursive(n + 1, stop)
end
recursive(0,1000) # line 13
Baseline - not stepping at all:
$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.252s
user 0m0.207s
sys 0m0.046s
Stepping before changes:
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m2.370s
user 0m2.292s
sys 0m0.077s
Stepping with changes from https://github.com/ruby/debug/pull/743 and https://github.com/ruby/debug/pull/746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.276s
user 0m0.228s
sys 0m0.048s
With both of these changes in place, there is a ~99% improvement in the next performance compared to the baseline.
Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.
Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.
@ko1 oh, I understand. Test result for #746 without #743 is in the PR description here: https://github.com/ruby/debug/pull/746#issue-1359369163 Under the Examples/validation header.
Let me know if you'd like me to run any other tests on either of these PRs! 👍
Ah, thank you!
This is summary:
# baseline
$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.241s
user 0m0.188s
sys 0m0.054s
# only with #743
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m1.147s
user 0m1.107s
sys 0m0.038s
# only with #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.285s
user 0m0.218s
sys 0m0.069s
# with #743 and #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real 0m0.276s
user 0m0.228s
sys 0m0.048s
It seems #746 is dominant, so I'll merge only #746.
#743 (this PR) I can't fully guarantee it is correct.
after reading the code, your code is more readable and ... maybe no problem, so I'll merge it.
Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.
Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.
Ah, no problem, I agree with
It seems #746 is dominant
anyway. Is it worth trying to fix multi thread support here or just go ahead with #746 only instead?
Is it worth trying to fix multi thread support here or just go ahead with https://github.com/ruby/debug/pull/746 only instead?
I think multi-threading support is not valuable than the advantages.
My comment "it doesn't support multi-threads" was wrong (because TracePoint only enables on the created thread) so I'll try to merge it again.