julia inference: remove `throw` block deoptimization completely

After experimenting with #49235, I started to question if we are getting any actual benefit from the throw block deoptimization anymore.

This commit removes the deoptimization from the system entirely.

Based on the numbers below, it appears that the deoptimization is not very profitable in our current Julia-level compilation pipeline, with the effects analysis playing a significant role in reducing latency.

Here are the updated benchmark:

Metric	master	#49235	this commit
Base (seconds)	15.579300	15.206645	15.42059
Stdlibs (seconds)	17.919013	17.667094	17.404586
Total (seconds)	33.499279	32.874737	32.826162
Precompilation (seconds)	53.488528	53.152028	53.152028
First time `plot(rand(10,3))` [^1]	`3.432983 seconds (16.55 M allocations)`	`3.477767 seconds (16.45 M allocations)`	`3.539117 seconds (16.43 M allocations)`
First time `solve(prob, QNDF())(5.0)` [^2]	`4.628278 seconds (15.74 M allocations)`	`4.609222 seconds (15.32 M allocations)`	`4.547323 seconds (15.19 M allocations: 823.510 MiB)`

[^1]: With disabling precompilation of Plots.jl. [^2]: With disabling precompilation of OrdinaryDiffEq.

Apr 05 '23 13:04 aviatesk

@nanosoldier runbenchmarks("inference", vs=":master")

Apr 05 '23 13:04 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Apr 05 '23 14:04 nanosoldier

I would like to remove it, but this seems surprisingly strongly negative on our benchmarks. Can we do something to reduce that, or is that just some sort of artifact of being a small test case?

Apr 06 '23 00:04 vtjnash

Yeah, there seems to be profitability for small cases. I will add more examples to BaseBenchmarks.jl (by using existing packages) and see what it will say.

Apr 06 '23 07:04 aviatesk

I think we should merge this. Now that tab complete relies on proving termination and effectfree-ness, this PR also makes tab complete better.

Apr 20 '23 18:04 oscardssmith

@nanosoldier runbenchmarks("inference", vs=":master")

Sep 20 '23 09:09 aviatesk

Your job failed.

Sep 20 '23 09:09 nanosoldier

@nanosoldier runbenchmarks("inference", vs=":master")

Sep 20 '23 10:09 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Sep 20 '23 11:09 nanosoldier

@nanosoldier runbenchmarks("inference", vs=":master")

Dec 07 '23 08:12 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Dec 07 '23 09:12 nanosoldier

@nanosoldier runbenchmarks("inference", vs=":master")

Jan 18 '24 07:01 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Jan 18 '24 08:01 nanosoldier

Ok. While there are a few clear regressions, I believe the downsides in the other compiler applications outweigh them. I've been wanting to get rid of the special cases for this functionality in JET too.

Apr 11 '24 07:04 aviatesk

@nanosoldier runbenchmarks("inference", vs=":master")

Apr 11 '24 13:04 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Apr 11 '24 14:04 nanosoldier

It looks like the regressions in cases such as ["inference", "allinference", "rand(Float64)"] are indeed real. Should we go ahead regardless?

Apr 12 '24 13:04 aviatesk

Besides that, this PR is ready to go. It has even fixed a few broken test cases.

Apr 12 '24 13:04 aviatesk

I think we can go ahead!

Apr 12 '24 14:04 gbaraldi

I've done a deeper dive, and it looks like the regression is specifically caused by this code: https://github.com/JuliaLang/julia/blob/be3bc9af09deb28e51c06854278ce4de099f500d/base/error.jl#L95 Here’s what I found:

julia> @newinterp LocalCacheInterp

julia> l95(b::Bool, tag::UInt, i::Int) = Base.inferencebarrier(b) ? throw(ArgumentError("Unexpected extended backtrace entry tag $tag at bt[$i]")) : nothing;

julia> interp = LocalCacheInterp();

julia> Base.infer_return_type(l95, (Bool,UInt,Int); interp)
Nothing

julia> length(interp.code_cache.dict)
2 # on master
593 # this PR

(For context, running Base.infer_return_type(rand, (Type{Float64},); interp) results in about 700 entries in interp.code_cache.dict, pointing out that string interpolation is a major part of the regression).

Considering this, it might be sensible to add a special handling for the pattern of throw(... something with string interpolation...).

Apr 12 '24 16:04 aviatesk

Could we make that a LazyString inside the Exception object and teach Test to treat those as equivalent so that not much breaks?

Apr 12 '24 17:04 vtjnash

Are you proposing an optimization where the optimizer replaces string(...)::String with lazystring(...)::LazyString when the return value is passed into an <:Exception object? What would be the approach for scenarios like this?

struct MyException <: Exception
    msg::String
end

let x = ...
    throw(MyException("... $x ..."))
end

Apr 15 '24 05:04 aviatesk

I thought we could just restrict the scope of unoptimize_throw_blocks to the string(...) calls that are known to be thrown.

Apr 15 '24 05:04 aviatesk

So talking with @topolarity I think that benchmark result is a bit misleading because sure, it's a lot of methods that get inferred, but they would only be new in an empty inference cache. From a top level glance they are mostly normal integer operations and any of amount of string printing (which is very likely to already be inferred) so maybe the cost isn't too large on a "real" world situation.

Apr 24 '24 18:04 gbaraldi

As an example:

julia> @newinterp LocalCacheInterp true

julia> l95(b::Bool, tag::UInt, i::Int) = Base.inferencebarrier(b) ? throw(ArgumentError("Unexpected extended backtrace entry tag $tag at bt[$i]")) : nothing;

julia> make_err(b::Bool, name::String, x::Float64) = ArgumentError("Found interesting $name with value $x? $b")

julia> interp = LocalCacheInterp();

julia> Base.infer_return_type(make_err, (Bool, String, Float64); interp); length(interp.code_cache.dict)
850

julia> Base.infer_return_type(l95, (Bool,UInt,Int); interp); length(interp.code_cache.dict)
921

The inference of l95(...) contributed ~71 new code instances, but that's not terribly surprising given that it is interpolating a different set of types into a string.

Maybe it'd be more profitable to make our string interpolation code less eager to generate so many specializations? By changing the code, rather than tweaking the compiler heuristics.

Apr 24 '24 18:04 topolarity

Would also be useful to see if there is anyway to show a regression from this PR on a non-synthetic benchmark. That was done in the first post in this PR and there it pretty much looks like within noise to me?

Apr 24 '24 18:04 KristofferC

Seems this got stalled a bit?

May 31 '24 12:05 fingolfin

Although benchmarks suggest that this is indeed an effective optimization, this optimization has been quite incomplete from the beginning and more importantly started to cause numerous issues in other projects. So I will prioritize the development efficiency of those projects and remove this optimization nevertheless. We could try to come up with an alternative solution to improve latency for those code paths, but it doesn't seem to be a good idea to leave this situation as is. I'm going to merge this once confirming CI passes successfully.

Jul 05 '24 07:07 aviatesk

CI is green after https://github.com/JuliaLang/julia/pull/55356 - Good to merge?

Aug 07 '24 14:08 topolarity

@nanosoldier runbenchmarks("inference", vs=":master")

Aug 07 '24 18:08 aviatesk

julia julia copied to clipboard

inference: remove `throw` block deoptimization completely

julia
julia copied to clipboard