JuliaInterpreter.jl
JuliaInterpreter.jl copied to clipboard
Compile (with a Cassette Pass) until fire breakpoint is hit
The idea that @MikeInnes and I came up with last night:
User actions:
- set 1 or more breakpoints
- run
@compiled_run foo() - the code runs as compiled until the function containing the breakpoint is hit.
- Now we are in the interpretter, for all stepping etc, until we step out, then we are compiled again.
Now to do this we need a Cassette style pass, that replaces every function call (in the compiled section) with: Pseudocode:
if contains_breakpoint(f, args...)
interpret(f, args...)
else
recurse(ctx, f, args...)
end
Further more, when Continue is run we can switch back into that compile.
(Possibly "ContinueCompiled").
The cost of this, you can't StepOut into compiled code.
Unless you made the compiled section actually have the full instrumention of MagneticReadHead.
But the advantage of this simpler compiler pass code over MagneticReadHead
is that it doesn't add anywhere need as many statements to the Code IR.
So while everything still needs to recompile, it doesn't hit those compile-time blackholes that MRH has.
I don't really think this is the right way to go. It wouldn't handle cases like
...
big_loop()
breakpoint
People would need to start splitting up their functions into "work functions" and the part they want to debug etc. Also, having to run Cassette on everything has a lot of drawbacks so I don't think this is the solution.
I think this is a solution worth investigating, Can we leave this open until I get around to running those investigations? Maybe add a [speculative] tag?
I started to write code for it, but I am still getting familar with how JuliaInterpretter works. So haven't done it yet.
Proof of concept: https://github.com/oxinabox/MixedModeDebugger.jl/blob/663d9b6565531263ca9a441bffb34404e37ad130/src/proto.jl
The Cassette-less alternative to this (at least for file breakpoints) would be some kind of Infiltrator/Debugger hybrid. Setting a breakpoint in whatever UI would recompile the relevant method with an @enter spliced in, basically.
So basically if you have
function foo(x)
y = sin(x)
while x > y
x -= sin(x)
end
x
end
and set a breakpoint on line 3 you'd end up with
function foo(x)
y = sin(x)
Main.Debugger.@enter((() ->
while x > y
x -= sin(x)
end
x
end)())
end
or something close to that. You also wouldn't need to prefix your function invocation with @run, which is also something people have complained about.
Yeah, I initially thought that was what Inflitrator did, and was going to just insert infliltrator's code.
I think just making an infiltrator style version of this would be pretty solid.
Benchmarking
with code written to always call continue when breakpoint hit.
Using code from https://github.com/oxinabox/MixedModeDebugger.jl/blob/541f63b8a61afb5a6d3a8473ae092fe304bfc5a0/src/proto.jl
New benchmarking function winter
function winter(A)
s = zero(eltype(A))
return winter_s1(s,A)
end
function winter_s1(s, A)
for a in A
s += exp(a)
end
return winter_s1s1(s)
end
function winter_s1s1(s)
return s + s
end
Trial is with const x = rand(1_000, 500)
Native: (no breakpoints)
winter(x)
- 1st Run: 0.046176 seconds (76.61 k allocations: 4.036 MiB)
- 2nd Run: 0.004589 seconds (5 allocations: 176 bytes)
- 3nd Run: 0.004865 seconds (5 allocations: 176 bytes)
No breakpoints
run_interpretted(winter, x)
- 1st Run: 28.640364 seconds (275.66 M allocations: 7.673 GiB, 2.57% gc time)
- 2nd Run: 24.058352 seconds (269.59 M allocations: 7.384 GiB, 1.47% gc time)
- 3nd Run: 23.998528 seconds (269.59 M allocations: 7.384 GiB, 1.48% gc time)
run_mixedmode(winter, x)
- 1st Run: 1.085078 seconds (3.91 M allocations: 204.089 MiB, 2.51% gc time)
- 2nd Run: 0.003748 seconds (5 allocations: 176 bytes)
- 3nd Run: 0.004449 seconds (5 allocations: 176 bytes)
Runtime for mixedmode is the clear winner here, since it runs at native speed, when there are no breakpoints.
The mixed mode compile-time was 300x worse than native, There is some fiddling that one can do with the compile-time, with regards what code gets generated.
Breakpoint on winter_s1
This has the same perforance as putting a breakpoint on any line in winte_s1
run_interpretted(winter, x)
- 1st Run: 29.471830 seconds (275.64 M allocations: 7.672 GiB, 2.47% gc time)
- 2nd Run: 25.295872 seconds (269.60 M allocations: 7.384 GiB, 1.56% gc time)
- 3nd Run: 25.142979 seconds (269.60 M allocations: 7.384 GiB, 1.61% gc time)
run_mixedmode(winter, x)
- 1st Run: 24.457268 seconds (272.33 M allocations: 7.520 GiB, 1.69% gc time)
- 2nd Run: 23.362447 seconds (269.60 M allocations: 7.384 GiB, 1.60% gc time)
- 3nd Run: 25.369876 seconds (269.60 M allocations: 7.384 GiB, 1.57% gc time)
So in this case it falls back to ther same performance as interpretting
As expected surprising since winter_s1 is where all the work is actually done.
So that whole function is getting interpretted.
Breakpoint on winter_s1s1
This has the same perforance as putting a breakpoint on any line in winte_s1
run_interpretted(winter, x)
- 1st Run: 28.968437 seconds (275.56 M allocations: 7.670 GiB, 2.54% gc time)
- 2nd Run: 24.877208 seconds (269.51 M allocations: 7.382 GiB, 1.51% gc time)
- 3nd Run: 24.963561 seconds (269.51 M allocations: 7.382 GiB, 1.61% gc time)
run_mixedmode(winter, x)
- 1st Run: 1.144460 seconds (3.96 M allocations: 207.329 MiB, 2.82% gc time)
- 2nd Run: 0.004416 seconds (185 allocations: 11.656 KiB)
- 3nd Run: 0.004966 seconds (185 allocations: 11.656 KiB)
So here mixed mode gets naitive speed,
since almost none of the work is done in winter_s1s1,
the loop is over.
Breakpoint on exp
this is the same as putting a breakpoint inside Base.exp
run_interpretted(winter, x)
- 1st Run: 30.114210 seconds (278.19 M allocations: 7.741 GiB, 2.94% gc time)
- 2nd Run: 26.520003 seconds (272.12 M allocations: 7.452 GiB, 1.46% gc time)
- 3nd Run: 26.604521 seconds (272.12 M allocations: 7.452 GiB, 1.57% gc time)
This doesn't terminate in 10 minutes for the mixed mode. This is where we hit the breakpoint every round of the loop. We have to keep switching into interpretted mode, which I guess has some overhead to starting.
I don't think it is a particularly realistic case, since this is hitting that breakpoint 500_000 times. Where as normally a programmer would hit it once or twice, then disable it / stop debugging.
But it does highlight that we need to make sure that when a breakpoint is disabled, the overdubs to switch it into interpretted mode are also removed.
The is a case where the breakppoint on Exp is a problem: if it is a conditional breakpoint. Since mixed mode switches to interpretted regardless of if the breakpoint is conditional or not, that case can be hit. There are work arounds, like the setting a (conditional or otherise) breakpoint on the method that calls the conditional breakpoint, if it is in a tight inner loop, so that it is already in interpretted mode. Also we can probably make interpretted mode much cheaper that start than it is right now.
You also wouldn't need to prefix your function invocation with @run, which is also something people have complained about.
Getting a bit ahead of self here, but this kind of mixed mode would also solve that in Juno. Since there basically no overhead when no breakpoints are set, one can just have the Juno REPL always run as a debugger in mixed mode. A bit of a scary idea. But possible.
@timholy @KristofferC whats next for this idea? If I put in the work to get it into JuliaInterpretter, would that be a thing we would like? Shall we wait and talk about it at JuliaCon?
I'm a bit swamped now, maybe talk at JuliaCon? Or in a month and a half, I may be able to carve out some time around the mid-March timeframe.
I'm happy to hold til JuliaCon.