julia
julia copied to clipboard
Add a Global Value Numbering Pass
This enables common subexpression elimination, fixes #50887. This is based on the NewGVN pass in LLVM.
A possible avenue for increasing the number of eliminations would be to implement code hoisting (eg. an expression that is also present in all predecessors to a block could be moved to the immediate dominator and then replaced; alternatively we could replace it with a phi node) or Partial Redundancy Elimination.
I'm not sure when it would be optimal for this pass to run, the effects of a function may allow a non inlined call to be eliminated completely whilst when inlined only parts may be deleted.
For small functions most of the time is spent on allocating two arrays (ssa_to_ssa
and the array backing val_to_ssa
). For larger functions the get!
dominates, #51091 speeds up get!
2x. I don't think there any easy optimizations left, one possible avenue would be computing use lists (a list of every instruction in which a SSAValue
is used) and using that to not have to iterate over every instruction.
That said, if we promote inaccessiblememonly to an IR flag, we could probably do this at least for things that don't touch memory, which might still be useful. Will have to wait until after #50805 though, since the flag space is currently full.
This is nice work, but doesn't work as is, because it doesn't consider the state of the heap. For example, consider the following function:
function foo() arr = Int[0] a = arr[] arr[] = 1 b = arr[] (a, b) end
As is,
a
, andb
will be considered congruent and replaced with each other. However, this is incorrect.
I'm a bit confused, this seems to work fine for me. I see the same IR before the GVN pass, after the GVN pass and after "compact 2" on master (#95af5a08926).
julia> foo()
(0, 1)
julia> ir = Base.code_ircode(foo, (); optimize_until="GVN") |> only |> first
2 1 ── %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, 1, 1))::Vector{Int64}
│ Base.arrayset(false, %1, 0, 1)::Vector{Int64}unds_setindex!
3 │ %3 = $(Expr(:boundscheck, true))::Bool getindex
└─── goto #16 if not %3 ││┃ _getindex
2 ── %5 = Base.arraysize(%1, 1)::Int64││╻╷╷ checkbounds
└─── goto #3 ││││╻╷╷╷ checkbounds
3 ── goto #4 │││││┃││││ checkbounds_indices
4 ── goto #5 ││││││┃││ all
5 ── %9 = (%5 === 1)::Bool │││││││╻╷ _all
└─── goto #7 if not %9 │││││││╻ _all
6 ── goto #8 ││││││││
7 ── goto #9 ││││││││
8 ── goto #9 ││││││││
9 ┄─ %14 = φ (#7 => false, #8 => true)::Bool│
└─── goto #10 │││││││
10 ─ goto #11 ││││││
11 ─ goto #12 │││││
12 ─ goto #14 if not %14 ││││
13 ─ goto #15 ││││
14 ─ invoke Base.throw_boundserror(%1::Vector{Int64}, ()::Tuple{})::Union{}
└─── unreachable ││││
15 ─ nothing::Nothing │
16 ┄ %23 = $(Expr(:boundscheck, false))::Bool getindex
│ %24 = Base.arrayref(%23, %1, 1)::Int64
└─── goto #17 ││╻ _getindex
17 ─ goto #18 ││
4 18 ─ %27 = $(Expr(:boundscheck, true))::Bool setindex!
└─── goto #33 if not %27 ││┃ _setindex!
19 ─ %29 = Base.arraysize(%1, 1)::Int64││╻╷╷ checkbounds
└─── goto #20 ││││╻╷╷╷ checkbounds
20 ─ goto #21 │││││┃││││ checkbounds_indices
21 ─ goto #22 ││││││┃││ all
22 ─ %33 = (%29 === 1)::Bool │││││││╻╷ _all
└─── goto #24 if not %33 │││││││╻ _all
23 ─ goto #25 ││││││││
24 ─ goto #26 ││││││││
25 ─ goto #26 ││││││││
26 ┄ %38 = φ (#24 => false, #25 => true)::Bool
└─── goto #27 │││││││
27 ─ goto #28 ││││││
28 ─ goto #29 │││││
29 ─ goto #31 if not %38 ││││
30 ─ goto #32 ││││
31 ─ invoke Base.throw_boundserror(%1::Vector{Int64}, ()::Tuple{})::Union{}
└─── unreachable ││││
32 ─ nothing::Nothing │
33 ┄ %47 = $(Expr(:boundscheck, false))::Bool setindex!
│ Base.arrayset(%47, %1, 1, 1)::Vector{Int64}
└─── goto #34 ││╻ _setindex!
34 ─ goto #35 ││
5 35 ─ %51 = $(Expr(:boundscheck, true))::Bool getindex
└─── goto #50 if not %51 ││┃ _getindex
36 ─ %53 = Base.arraysize(%1, 1)::Int64││╻╷╷ checkbounds
└─── goto #37 ││││╻╷╷╷ checkbounds
37 ─ goto #38 │││││┃││││ checkbounds_indices
38 ─ goto #39 ││││││┃││ all
39 ─ %57 = (%53 === 1)::Bool │││││││╻╷ _all
└─── goto #41 if not %57 │││││││╻ _all
40 ─ goto #42 ││││││││
41 ─ goto #43 ││││││││
42 ─ goto #43 ││││││││
43 ┄ %62 = φ (#41 => false, #42 => true)::Bool
└─── goto #44 │││││││
44 ─ goto #45 ││││││
45 ─ goto #46 │││││
46 ─ goto #48 if not %62 ││││
47 ─ goto #49 ││││
48 ─ invoke Base.throw_boundserror(%1::Vector{Int64}, ()::Tuple{})::Union{}
└─── unreachable ││││
49 ─ nothing::Nothing │
50 ┄ %71 = $(Expr(:boundscheck, false))::Bool getindex
│ %72 = Base.arrayref(%71, %1, 1)::Int64
└─── goto #51 ││╻ _getindex
51 ─ goto #52 ││
6 52 ─ %75 = Core.tuple(%24, %72)::Tuple{Int64, Int64}
└─── return %75
Looking at the flags of arrayref
,arrayset
and arraysize
we get 0x00000110
, 0x00000100
and 0x00000130
respectively so these don't get touched by this pass anyways (the consistent flag isn't set on any of them).
Hmm, I may have been wrong, and IR_FLAG_CONSISTENT might be strong enough. I'll need to think about it again to make sure.
Does this fix #50887?
Does this fix #50887?
Yes that was the inspiration for this.
FTR, if you put a Fixes #50887
line into a commit or PR description, Github is supposed to automatically close the issue.
I'm not sure when it would be optimal for this pass to run, the effects of a function may allow a non inlined call to be eliminated completely whilst when inlined only parts may be deleted.
You already give a reason why it's useful before inlining, here's a reason for having the pass run after inlining:
Suppose I'm calling a function whose implementation I can't modify, such as sqrt
, but I want some error checks to be eliminated. In the case of sqrt(x)
this would be a check that throws in case x < 0
, so assume we know 0 ≤ x
. This assumption may be encoded in Julia like so: (x < 0) && Core.Intrinsics.llvmcall("unreachable", Union{}, Tuple{})
. In this way, one is already able to eliminate the check, but only after LLVM gets involved:
function unsafe_sqrt(x)
@inline begin
is_negative = x < zero(x)
is_negative &&
Core.Intrinsics.llvmcall("unreachable", Union{}, Tuple{})
sqrt(x)
end
end
julia> Base.infer_effects(unsafe_sqrt, Tuple{Float64}) # Julia thinks the function may throw
(+c,!e,!n,!t,!s,!m,!u)′
julia> @code_typed unsafe_sqrt(0.3)
CodeInfo(
1 ─ %1 = Base.lt_float(x, 0.0)::Bool
└── goto #3 if not %1
2 ─ (Core.Intrinsics.llvmcall)("unreachable", Union{}, Tuple{})::Union{}
└── unreachable
3 ─ %5 = Base.lt_float(x, 0.0)::Bool
└── goto #5 if not %5
4 ─ invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, x::Float64)::Union{}
└── unreachable
5 ─ %9 = Base.Math.sqrt_llvm(x)::Float64
└── goto #6
6 ─ return %9
) => Float64
julia> @code_native debuginfo=:none unsafe_sqrt(0.3)
.text
.file "unsafe_sqrt"
.globl julia_unsafe_sqrt_7199 # -- Begin function julia_unsafe_sqrt_7199
.p2align 4, 0x90
.type julia_unsafe_sqrt_7199,@function
julia_unsafe_sqrt_7199: # @julia_unsafe_sqrt_7199
; Function Signature: unsafe_sqrt(Float64)
# %bb.0: # %top
#DEBUG_VALUE: unsafe_sqrt:x <- $xmm0
push rbp
mov rbp, rsp
vsqrtsd xmm0, xmm0, xmm0
pop rbp
ret
So LLVM successfully optimizes the machine code to a single vsqrtsd
operation, with no throwing, but Julia doesn't know that unsafe_sqrt
doesn't throw. I'm thinking if there was a common-subexpression elimination pass after inlining, it could deduplicate the two Base.lt_float(x, 0.0)::Bool
calls, after which it would be easier (although still not implemented, I guess) for Julia to eliminate the throw as unreachable.
@nanosoldier runbenchmarks("inference", vs=":master")
I think it might be a good idea to run PkgEval on this to root out any more bugs. #51091 and #53844 should speed up this pr by 2-3x.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.