julia Add a Global Value Numbering Pass

This enables common subexpression elimination, fixes #50887. This is based on the NewGVN pass in LLVM.

A possible avenue for increasing the number of eliminations would be to implement code hoisting (eg. an expression that is also present in all predecessors to a block could be moved to the immediate dominator and then replaced; alternatively we could replace it with a phi node) or Partial Redundancy Elimination.

I'm not sure when it would be optimal for this pass to run, the effects of a function may allow a non inlined call to be eliminated completely whilst when inlined only parts may be deleted.

For small functions most of the time is spent on allocating two arrays (ssa_to_ssa and the array backing val_to_ssa). For larger functions the get! dominates, #51091 speeds up get! 2x. I don't think there any easy optimizations left, one possible avenue would be computing use lists (a list of every instruction in which a SSAValue is used) and using that to not have to iterate over every instruction.

Aug 30 '23 19:08 Zentrik

That said, if we promote inaccessiblememonly to an IR flag, we could probably do this at least for things that don't touch memory, which might still be useful. Will have to wait until after #50805 though, since the flag space is currently full.

Aug 30 '23 23:08 Keno

This is nice work, but doesn't work as is, because it doesn't consider the state of the heap. For example, consider the following function:
function foo()
    arr = Int[0]
    a = arr[]
    arr[] = 1
    b = arr[]
    (a, b)
end
As is, a, and b will be considered congruent and replaced with each other. However, this is incorrect.

I'm a bit confused, this seems to work fine for me. I see the same IR before the GVN pass, after the GVN pass and after "compact 2" on master (#95af5a08926).

julia> foo()
(0, 1)

julia> ir = Base.code_ircode(foo, (); optimize_until="GVN") |> only |> first
2 1 ── %1  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, 1, 1))::Vector{Int64}
  │          Base.arrayset(false, %1, 0, 1)::Vector{Int64}unds_setindex!
3 │    %3  = $(Expr(:boundscheck, true))::Bool     getindex
  └───       goto #16 if not %3         ││┃         _getindex
  2 ── %5  = Base.arraysize(%1, 1)::Int64││╻╷╷       checkbounds
  └───       goto #3                    ││││╻╷╷╷      checkbounds
  3 ──       goto #4                    │││││┃││││     checkbounds_indices
  4 ──       goto #5                    ││││││┃││       all
  5 ── %9  = (%5 === 1)::Bool           │││││││╻╷        _all
  └───       goto #7 if not %9          │││││││╻         _all
  6 ──       goto #8                    ││││││││  
  7 ──       goto #9                    ││││││││  
  8 ──       goto #9                    ││││││││  
  9 ┄─ %14 = φ (#7 => false, #8 => true)::Bool│   
  └───       goto #10                   │││││││   
  10 ─       goto #11                   ││││││    
  11 ─       goto #12                   │││││     
  12 ─       goto #14 if not %14        ││││      
  13 ─       goto #15                   ││││      
  14 ─       invoke Base.throw_boundserror(%1::Vector{Int64}, ()::Tuple{})::Union{}
  └───       unreachable                ││││      
  15 ─       nothing::Nothing           │         
  16 ┄ %23 = $(Expr(:boundscheck, false))::Bool      getindex
  │    %24 = Base.arrayref(%23, %1, 1)::Int64     
  └───       goto #17                   ││╻         _getindex
  17 ─       goto #18                   ││        
4 18 ─ %27 = $(Expr(:boundscheck, true))::Bool     setindex!
  └───       goto #33 if not %27        ││┃         _setindex!
  19 ─ %29 = Base.arraysize(%1, 1)::Int64││╻╷╷       checkbounds
  └───       goto #20                   ││││╻╷╷╷      checkbounds
  20 ─       goto #21                   │││││┃││││     checkbounds_indices
  21 ─       goto #22                   ││││││┃││       all
  22 ─ %33 = (%29 === 1)::Bool          │││││││╻╷        _all
  └───       goto #24 if not %33        │││││││╻         _all
  23 ─       goto #25                   ││││││││  
  24 ─       goto #26                   ││││││││  
  25 ─       goto #26                   ││││││││  
  26 ┄ %38 = φ (#24 => false, #25 => true)::Bool  
  └───       goto #27                   │││││││   
  27 ─       goto #28                   ││││││    
  28 ─       goto #29                   │││││     
  29 ─       goto #31 if not %38        ││││      
  30 ─       goto #32                   ││││      
  31 ─       invoke Base.throw_boundserror(%1::Vector{Int64}, ()::Tuple{})::Union{}
  └───       unreachable                ││││      
  32 ─       nothing::Nothing           │         
  33 ┄ %47 = $(Expr(:boundscheck, false))::Bool      setindex!
  │          Base.arrayset(%47, %1, 1, 1)::Vector{Int64}
  └───       goto #34                   ││╻         _setindex!
  34 ─       goto #35                   ││        
5 35 ─ %51 = $(Expr(:boundscheck, true))::Bool     getindex
  └───       goto #50 if not %51        ││┃         _getindex
  36 ─ %53 = Base.arraysize(%1, 1)::Int64││╻╷╷       checkbounds
  └───       goto #37                   ││││╻╷╷╷      checkbounds
  37 ─       goto #38                   │││││┃││││     checkbounds_indices
  38 ─       goto #39                   ││││││┃││       all
  39 ─ %57 = (%53 === 1)::Bool          │││││││╻╷        _all
  └───       goto #41 if not %57        │││││││╻         _all
  40 ─       goto #42                   ││││││││  
  41 ─       goto #43                   ││││││││  
  42 ─       goto #43                   ││││││││  
  43 ┄ %62 = φ (#41 => false, #42 => true)::Bool  
  └───       goto #44                   │││││││   
  44 ─       goto #45                   ││││││    
  45 ─       goto #46                   │││││     
  46 ─       goto #48 if not %62        ││││      
  47 ─       goto #49                   ││││      
  48 ─       invoke Base.throw_boundserror(%1::Vector{Int64}, ()::Tuple{})::Union{}
  └───       unreachable                ││││      
  49 ─       nothing::Nothing           │         
  50 ┄ %71 = $(Expr(:boundscheck, false))::Bool      getindex
  │    %72 = Base.arrayref(%71, %1, 1)::Int64     
  └───       goto #51                   ││╻         _getindex
  51 ─       goto #52                   ││        
6 52 ─ %75 = Core.tuple(%24, %72)::Tuple{Int64, Int64}
  └───       return %75

Looking at the flags of arrayref,arrayset and arraysize we get 0x00000110, 0x00000100 and 0x00000130 respectively so these don't get touched by this pass anyways (the consistent flag isn't set on any of them).

Aug 31 '23 10:08 Zentrik

Hmm, I may have been wrong, and IR_FLAG_CONSISTENT might be strong enough. I'll need to think about it again to make sure.

Aug 31 '23 10:08 Keno

Does this fix #50887?

Sep 29 '23 11:09 nsajko

Does this fix #50887?

Yes that was the inspiration for this.

Sep 29 '23 11:09 Zentrik

FTR, if you put a Fixes #50887 line into a commit or PR description, Github is supposed to automatically close the issue.

Sep 29 '23 11:09 nsajko

I'm not sure when it would be optimal for this pass to run, the effects of a function may allow a non inlined call to be eliminated completely whilst when inlined only parts may be deleted.

You already give a reason why it's useful before inlining, here's a reason for having the pass run after inlining:

Suppose I'm calling a function whose implementation I can't modify, such as sqrt, but I want some error checks to be eliminated. In the case of sqrt(x) this would be a check that throws in case x < 0, so assume we know 0 ≤ x. This assumption may be encoded in Julia like so: (x < 0) && Core.Intrinsics.llvmcall("unreachable", Union{}, Tuple{}). In this way, one is already able to eliminate the check, but only after LLVM gets involved:

function unsafe_sqrt(x)
  @inline begin
    is_negative = x < zero(x)
    is_negative &&
      Core.Intrinsics.llvmcall("unreachable", Union{}, Tuple{})
    sqrt(x)
  end
end

julia> Base.infer_effects(unsafe_sqrt, Tuple{Float64})   # Julia thinks the function may throw
(+c,!e,!n,!t,!s,!m,!u)′

julia> @code_typed unsafe_sqrt(0.3)
CodeInfo(
1 ─ %1 = Base.lt_float(x, 0.0)::Bool
└──      goto #3 if not %1
2 ─      (Core.Intrinsics.llvmcall)("unreachable", Union{}, Tuple{})::Union{}
└──      unreachable
3 ─ %5 = Base.lt_float(x, 0.0)::Bool
└──      goto #5 if not %5
4 ─      invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, x::Float64)::Union{}
└──      unreachable
5 ─ %9 = Base.Math.sqrt_llvm(x)::Float64
└──      goto #6
6 ─      return %9
) => Float64

julia> @code_native debuginfo=:none unsafe_sqrt(0.3)
        .text
        .file   "unsafe_sqrt"
        .globl  julia_unsafe_sqrt_7199          # -- Begin function julia_unsafe_sqrt_7199
        .p2align        4, 0x90
        .type   julia_unsafe_sqrt_7199,@function
julia_unsafe_sqrt_7199:                 # @julia_unsafe_sqrt_7199
; Function Signature: unsafe_sqrt(Float64)
# %bb.0:                                # %top
        #DEBUG_VALUE: unsafe_sqrt:x <- $xmm0
        push    rbp
        mov     rbp, rsp
        vsqrtsd xmm0, xmm0, xmm0
        pop     rbp
        ret

So LLVM successfully optimizes the machine code to a single vsqrtsd operation, with no throwing, but Julia doesn't know that unsafe_sqrt doesn't throw. I'm thinking if there was a common-subexpression elimination pass after inlining, it could deduplicate the two Base.lt_float(x, 0.0)::Bool calls, after which it would be easier (although still not implemented, I guess) for Julia to eliminate the throw as unreachable.

Nov 29 '23 20:11 nsajko

@nanosoldier runbenchmarks("inference", vs=":master")

Jul 14 '24 13:07 Zentrik

I think it might be a good idea to run PkgEval on this to root out any more bugs. #51091 and #53844 should speed up this pr by 2-3x.

Jul 14 '24 13:07 Zentrik

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Jul 14 '24 14:07 nanosoldier

julia julia copied to clipboard

Add a Global Value Numbering Pass

julia
julia copied to clipboard