optimizer: early `finalize` insertion
Currently, in the finalizer inlining pass, if not all the code between the finalizer registration and the end of the object’s lifetime (i.e., where the finalizer would be inlined) is marked as :nothrow, it simply bails out. However, even in such cases, we can insert a finalize call at the end of the object’s lifetime, allowing us to call the finalizer early if no exceptions occur.
This commit implements this optimization. To do so, it also moves finalize to Core, so the compiler can handle it directly.
Instead of simply inserting finalize, I changed the approach to inline the finalizer body and cancel the finalizer registration (by preparing new builtin Core._cancel_finalizer).
It looks like this PR is now achieving pretty good performance.
mutable struct AtomicCounter
@atomic count::Int
end
const counter = AtomicCounter(0)
const _throw_or_not = Ref(false)
@noinline throw_or_noop() = _throw_or_not[] ? error("") : nothing
function withfinalizer(x)
xs = finalizer(Ref(x)) do obj
Base.@assume_effects :nothrow :notaskstate
@atomic counter.count += obj[]
end
throw_or_noop()
return xs[]+=1
end
@benchmark withfinalizer(0)
master
julia> @benchmark withfinalizer(1)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max): 11.177 ns … 124.200 μs ┊ GC (min … max): 0.00% … 39.95%
Time (median): 11.928 ns ┊ GC (median): 0.00%
Time (mean ± σ): 46.081 ns ± 1.765 μs ┊ GC (mean ± σ): 30.60% ± 0.84%
▃▄▅▇▇█▆▆▄▅▆▆▅▅▃▃▂▁ ▁ ▁▁ ▁▂▃▁▁▁ ▁ ▃
▇██████████████████████▇█▇█▇█▇▇▆▇▇█▇█▇███████████▇▆▇▆▄▆▃▄▁▄▄ █
11.2 ns Histogram: log(frequency) by time 17.4 ns <
Memory estimate: 16 bytes, allocs estimate: 1.
this PR
julia> @benchmark withfinalizer(1)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 17.327 ns … 1.416 μs ┊ GC (min … max): 0.00% … 97.78%
Time (median): 18.329 ns ┊ GC (median): 0.00%
Time (mean ± σ): 18.802 ns ± 16.372 ns ┊ GC (mean ± σ): 1.35% ± 1.65%
▂█▇▂
▂▂▂▂▂▂▃▅▆████▇▅▃▂▂▃▄▄▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
17.3 ns Histogram: frequency by time 22.6 ns <
Memory estimate: 16 bytes, allocs estimate: 1.
This PR implements an optimization that inlines the target finalizer call and, instead of removing the finalizer registration like before, cancels the registration by calling Core._cancel_finalizer. With finalizer cancellation, only the target finalizer (and its corresponding object) is removed from the finalizers list, so I believe it leaves the order of the other finalizers unchanged, but am I misunderstanding something here?