julia
julia copied to clipboard
inference: remove `throw` block deoptimization completely
After experimenting with #49235, I started to question if we are getting any actual benefit from the throw
block deoptimization anymore.
This commit removes the deoptimization from the system entirely.
Based on the numbers below, it appears that the deoptimization is not very profitable in our current Julia-level compilation pipeline, with the effects analysis playing a significant role in reducing latency.
Here are the updated benchmark:
Metric | master | #49235 | this commit |
---|---|---|---|
Base (seconds) | 15.579300 | 15.206645 | 15.42059 |
Stdlibs (seconds) | 17.919013 | 17.667094 | 17.404586 |
Total (seconds) | 33.499279 | 32.874737 | 32.826162 |
Precompilation (seconds) | 53.488528 | 53.152028 | 53.152028 |
First time plot(rand(10,3)) [^1] |
3.432983 seconds (16.55 M allocations) |
3.477767 seconds (16.45 M allocations) |
3.539117 seconds (16.43 M allocations) |
First time solve(prob, QNDF())(5.0) [^2] |
4.628278 seconds (15.74 M allocations) |
4.609222 seconds (15.32 M allocations) |
4.547323 seconds (15.19 M allocations: 823.510 MiB) |
[^1]: With disabling precompilation of Plots.jl. [^2]: With disabling precompilation of OrdinaryDiffEq.
@nanosoldier runbenchmarks("inference", vs=":master")
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.
I would like to remove it, but this seems surprisingly strongly negative on our benchmarks. Can we do something to reduce that, or is that just some sort of artifact of being a small test case?
Yeah, there seems to be profitability for small cases. I will add more examples to BaseBenchmarks.jl (by using existing packages) and see what it will say.
I think we should merge this. Now that tab complete relies on proving termination and effectfree-ness, this PR also makes tab complete better.
@nanosoldier runbenchmarks("inference", vs=":master")
Your job failed.
@nanosoldier runbenchmarks("inference", vs=":master")
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.
@nanosoldier runbenchmarks("inference", vs=":master")
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.
@nanosoldier runbenchmarks("inference", vs=":master")
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.
Ok. While there are a few clear regressions, I believe the downsides in the other compiler applications outweigh them. I've been wanting to get rid of the special cases for this functionality in JET too.
@nanosoldier runbenchmarks("inference", vs=":master")
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.
It looks like the regressions in cases such as ["inference", "allinference", "rand(Float64)"]
are indeed real. Should we go ahead regardless?
Besides that, this PR is ready to go. It has even fixed a few broken test cases.
I think we can go ahead!
I've done a deeper dive, and it looks like the regression is specifically caused by this code: https://github.com/JuliaLang/julia/blob/be3bc9af09deb28e51c06854278ce4de099f500d/base/error.jl#L95 Here’s what I found:
julia> @newinterp LocalCacheInterp
julia> l95(b::Bool, tag::UInt, i::Int) = Base.inferencebarrier(b) ? throw(ArgumentError("Unexpected extended backtrace entry tag $tag at bt[$i]")) : nothing;
julia> interp = LocalCacheInterp();
julia> Base.infer_return_type(l95, (Bool,UInt,Int); interp)
Nothing
julia> length(interp.code_cache.dict)
2 # on master
593 # this PR
(For context, running Base.infer_return_type(rand, (Type{Float64},); interp)
results in about 700 entries in interp.code_cache.dict
, pointing out that string interpolation is a major part of the regression).
Considering this, it might be sensible to add a special handling for the pattern of throw(... something with string interpolation...)
.
Could we make that a LazyString inside the Exception object and teach Test to treat those as equivalent so that not much breaks?
Are you proposing an optimization where the optimizer replaces string(...)::String
with lazystring(...)::LazyString
when the return value is passed into an <:Exception
object? What would be the approach for scenarios like this?
struct MyException <: Exception
msg::String
end
let x = ...
throw(MyException("... $x ..."))
end
I thought we could just restrict the scope of unoptimize_throw_blocks
to the string(...)
calls that are known to be throw
n.
So talking with @topolarity I think that benchmark result is a bit misleading because sure, it's a lot of methods that get inferred, but they would only be new in an empty inference cache. From a top level glance they are mostly normal integer operations and any of amount of string printing (which is very likely to already be inferred) so maybe the cost isn't too large on a "real" world situation.
As an example:
julia> @newinterp LocalCacheInterp true
julia> l95(b::Bool, tag::UInt, i::Int) = Base.inferencebarrier(b) ? throw(ArgumentError("Unexpected extended backtrace entry tag $tag at bt[$i]")) : nothing;
julia> make_err(b::Bool, name::String, x::Float64) = ArgumentError("Found interesting $name with value $x? $b")
julia> interp = LocalCacheInterp();
julia> Base.infer_return_type(make_err, (Bool, String, Float64); interp); length(interp.code_cache.dict)
850
julia> Base.infer_return_type(l95, (Bool,UInt,Int); interp); length(interp.code_cache.dict)
921
The inference of l95(...)
contributed ~71 new code instances, but that's not terribly surprising given that it is interpolating a different set of types into a string.
Maybe it'd be more profitable to make our string interpolation code less eager to generate so many specializations? By changing the code, rather than tweaking the compiler heuristics.
Would also be useful to see if there is anyway to show a regression from this PR on a non-synthetic benchmark. That was done in the first post in this PR and there it pretty much looks like within noise to me?
Seems this got stalled a bit?
Although benchmarks suggest that this is indeed an effective optimization, this optimization has been quite incomplete from the beginning and more importantly started to cause numerous issues in other projects. So I will prioritize the development efficiency of those projects and remove this optimization nevertheless. We could try to come up with an alternative solution to improve latency for those code paths, but it doesn't seem to be a good idea to leave this situation as is. I'm going to merge this once confirming CI passes successfully.
CI is green after https://github.com/JuliaLang/julia/pull/55356 - Good to merge?
@nanosoldier runbenchmarks("inference", vs=":master")