julia icon indicating copy to clipboard operation
julia copied to clipboard

inference: remove `throw` block deoptimization completely

Open aviatesk opened this issue 1 year ago • 26 comments

After experimenting with #49235, I started to question if we are getting any actual benefit from the throw block deoptimization anymore.

This commit removes the deoptimization from the system entirely.

Based on the numbers below, it appears that the deoptimization is not very profitable in our current Julia-level compilation pipeline, with the effects analysis playing a significant role in reducing latency.

Here are the updated benchmark:

Metric master #49235 this commit
Base (seconds) 15.579300 15.206645 15.42059
Stdlibs (seconds) 17.919013 17.667094 17.404586
Total (seconds) 33.499279 32.874737 32.826162
Precompilation (seconds) 53.488528 53.152028 53.152028
First time plot(rand(10,3)) [^1] 3.432983 seconds (16.55 M allocations) 3.477767 seconds (16.45 M allocations) 3.539117 seconds (16.43 M allocations)
First time solve(prob, QNDF())(5.0) [^2] 4.628278 seconds (15.74 M allocations) 4.609222 seconds (15.32 M allocations) 4.547323 seconds (15.19 M allocations: 823.510 MiB)

[^1]: With disabling precompilation of Plots.jl. [^2]: With disabling precompilation of OrdinaryDiffEq.

aviatesk avatar Apr 05 '23 13:04 aviatesk

@nanosoldier runbenchmarks("inference", vs=":master")

aviatesk avatar Apr 05 '23 13:04 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier avatar Apr 05 '23 14:04 nanosoldier

I would like to remove it, but this seems surprisingly strongly negative on our benchmarks. Can we do something to reduce that, or is that just some sort of artifact of being a small test case?

vtjnash avatar Apr 06 '23 00:04 vtjnash

Yeah, there seems to be profitability for small cases. I will add more examples to BaseBenchmarks.jl (by using existing packages) and see what it will say.

aviatesk avatar Apr 06 '23 07:04 aviatesk

I think we should merge this. Now that tab complete relies on proving termination and effectfree-ness, this PR also makes tab complete better.

oscardssmith avatar Apr 20 '23 18:04 oscardssmith

@nanosoldier runbenchmarks("inference", vs=":master")

aviatesk avatar Sep 20 '23 09:09 aviatesk

Your job failed.

nanosoldier avatar Sep 20 '23 09:09 nanosoldier

@nanosoldier runbenchmarks("inference", vs=":master")

aviatesk avatar Sep 20 '23 10:09 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier avatar Sep 20 '23 11:09 nanosoldier

@nanosoldier runbenchmarks("inference", vs=":master")

aviatesk avatar Dec 07 '23 08:12 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier avatar Dec 07 '23 09:12 nanosoldier

@nanosoldier runbenchmarks("inference", vs=":master")

aviatesk avatar Jan 18 '24 07:01 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier avatar Jan 18 '24 08:01 nanosoldier

Ok. While there are a few clear regressions, I believe the downsides in the other compiler applications outweigh them. I've been wanting to get rid of the special cases for this functionality in JET too.

aviatesk avatar Apr 11 '24 07:04 aviatesk

@nanosoldier runbenchmarks("inference", vs=":master")

aviatesk avatar Apr 11 '24 13:04 aviatesk

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

nanosoldier avatar Apr 11 '24 14:04 nanosoldier

It looks like the regressions in cases such as ["inference", "allinference", "rand(Float64)"] are indeed real. Should we go ahead regardless?

aviatesk avatar Apr 12 '24 13:04 aviatesk

Besides that, this PR is ready to go. It has even fixed a few broken test cases.

aviatesk avatar Apr 12 '24 13:04 aviatesk

I think we can go ahead!

gbaraldi avatar Apr 12 '24 14:04 gbaraldi

I've done a deeper dive, and it looks like the regression is specifically caused by this code: https://github.com/JuliaLang/julia/blob/be3bc9af09deb28e51c06854278ce4de099f500d/base/error.jl#L95 Here’s what I found:

julia> @newinterp LocalCacheInterp

julia> l95(b::Bool, tag::UInt, i::Int) = Base.inferencebarrier(b) ? throw(ArgumentError("Unexpected extended backtrace entry tag $tag at bt[$i]")) : nothing;

julia> interp = LocalCacheInterp();

julia> Base.infer_return_type(l95, (Bool,UInt,Int); interp)
Nothing

julia> length(interp.code_cache.dict)
2 # on master
593 # this PR

(For context, running Base.infer_return_type(rand, (Type{Float64},); interp) results in about 700 entries in interp.code_cache.dict, pointing out that string interpolation is a major part of the regression).

Considering this, it might be sensible to add a special handling for the pattern of throw(... something with string interpolation...).

aviatesk avatar Apr 12 '24 16:04 aviatesk

Could we make that a LazyString inside the Exception object and teach Test to treat those as equivalent so that not much breaks?

vtjnash avatar Apr 12 '24 17:04 vtjnash

Are you proposing an optimization where the optimizer replaces string(...)::String with lazystring(...)::LazyString when the return value is passed into an <:Exception object? What would be the approach for scenarios like this?

struct MyException <: Exception
    msg::String
end

let x = ...
    throw(MyException("... $x ..."))
end

aviatesk avatar Apr 15 '24 05:04 aviatesk

I thought we could just restrict the scope of unoptimize_throw_blocks to the string(...) calls that are known to be thrown.

aviatesk avatar Apr 15 '24 05:04 aviatesk

So talking with @topolarity I think that benchmark result is a bit misleading because sure, it's a lot of methods that get inferred, but they would only be new in an empty inference cache. From a top level glance they are mostly normal integer operations and any of amount of string printing (which is very likely to already be inferred) so maybe the cost isn't too large on a "real" world situation.

gbaraldi avatar Apr 24 '24 18:04 gbaraldi

As an example:

julia> @newinterp LocalCacheInterp true

julia> l95(b::Bool, tag::UInt, i::Int) = Base.inferencebarrier(b) ? throw(ArgumentError("Unexpected extended backtrace entry tag $tag at bt[$i]")) : nothing;

julia> make_err(b::Bool, name::String, x::Float64) = ArgumentError("Found interesting $name with value $x? $b")

julia> interp = LocalCacheInterp();

julia> Base.infer_return_type(make_err, (Bool, String, Float64); interp); length(interp.code_cache.dict)
850

julia> Base.infer_return_type(l95, (Bool,UInt,Int); interp); length(interp.code_cache.dict)
921

The inference of l95(...) contributed ~71 new code instances, but that's not terribly surprising given that it is interpolating a different set of types into a string.

Maybe it'd be more profitable to make our string interpolation code less eager to generate so many specializations? By changing the code, rather than tweaking the compiler heuristics.

topolarity avatar Apr 24 '24 18:04 topolarity

Would also be useful to see if there is anyway to show a regression from this PR on a non-synthetic benchmark. That was done in the first post in this PR and there it pretty much looks like within noise to me?

KristofferC avatar Apr 24 '24 18:04 KristofferC

Seems this got stalled a bit?

fingolfin avatar May 31 '24 12:05 fingolfin

Although benchmarks suggest that this is indeed an effective optimization, this optimization has been quite incomplete from the beginning and more importantly started to cause numerous issues in other projects. So I will prioritize the development efficiency of those projects and remove this optimization nevertheless. We could try to come up with an alternative solution to improve latency for those code paths, but it doesn't seem to be a good idea to leave this situation as is. I'm going to merge this once confirming CI passes successfully.

aviatesk avatar Jul 05 '24 07:07 aviatesk

CI is green after https://github.com/JuliaLang/julia/pull/55356 - Good to merge?

topolarity avatar Aug 07 '24 14:08 topolarity

@nanosoldier runbenchmarks("inference", vs=":master")

aviatesk avatar Aug 07 '24 18:08 aviatesk