julia effects: add new `@consistent

This PR serves to replace #51080 and close #52940. It extends the :nonoverlayed to UInt8 and introduces the CONSISTENT_OVERLAY effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. Additionally, this PR adds a new effect override called :consistent_overlay. I've also included a relatively accurate description of :consistent_overlay, as pointed out in #51080. Quoting from the newly added docstrings:

:consistent_overlay

The :consistent_overlay setting asserts that any overlayed methods potentially called by the method are :consistent with their original, non-overlayed counterparts. For the exact definition of :consistent, refer to the earlier explanation.

More formally, when evaluating a generic function call f(x) at a specific world-age i, and the regular method call fᵢ(x) is redirected to an overlay method fᵢ′(x), this setting requires that fᵢ(x) ≡ fᵢ′(x).

!!! note Note that the requirements for :consistent-cy include not only that the return values are egal, but also that the manner of termination is the same. However, it's important to aware that when they throw exceptions, the exceptions themselves don't necessarily have to be egal as explained in the note of :consistent. In other words, if fᵢ(x) throws an exception, this settings requires fᵢ′(x) to also raise one, but the exact exceptions may differ.

!!! note This setting isn't supported at the callsite; it has to be applied at the definition site. Also, given its nature, it's expected to be used together with Base.Experimental.@overlay.

Still, the explanation might not be comprehensive enough. I welcome any feedback.

EDIT: Now this feature is implemented with a new macro @consistent_overlay instead of extending the override set of @assume_effects.

May 01 '24 10:05 aviatesk

could we also add a weaker version of this effect that only requires the overlay to return an egal answer when the regular function returns an answer? (e.g. for things like NaNMath)

May 01 '24 11:05 oscardssmith

That sounds exactly like the requirements for :consistent_overlay. Can you explain how that version is considered weaker?

May 01 '24 12:05 aviatesk

If I'm reading this right, it wouldn't be valid to use this extension for log_overlay(x) = x < 0 ? NaN : log(x) since log_overlay returns a value rather than throwing for negative numbers.

May 01 '24 12:05 oscardssmith

Ah, did you mean @overlay NAN_MATH_TABLE log(x) = x < 0 ? NaN : log(x), right? We can create such a :nonoverlay effect bit (, which allows concrete evaluation with bailing out of it if the original method's execution raises an issue). But that nuanced setting would be more complicated than the already complex :consistent_overlay, so I'd prefer to introduce it only when there's a demonstrated need.

May 01 '24 13:05 aviatesk

I'm ok with this, but it seems a bit weird to put this on @assume_effects. I don't even mind this going into in the (internal) EffectsOverride, I'm just thinking in terms of where to put the user-facing part of this. I think it makes more sense as an option to the @overlay macro.

May 02 '24 06:05 Keno

I think it makes more sense as an option to the @overlay macro.

I agree. I've created a new @consistent_overlay macro and included a thorough documentation.

May 08 '24 12:05 aviatesk

Sorry for the slow response, upgrading the GPU stack to LLVM 17 took longer than expected.

Using @consistent_overlay instead of @overlay doesn't seem sufficient to fix the issues in https://github.com/JuliaGPU/CUDA.jl/issues/2241#issue-2086184007. Some have been fixed on 1.12 regardless of this PR, but one that remains:

using CUDA

cudacall(f, types::Type, args...; kwargs...) = nothing

function outer(f)
    @inline cudacall(f, Tuple{}; stream=Ref(42), shmem=1)
    return
end

using InteractiveUtils
InteractiveUtils.code_llvm(outer, Tuple{Nothing})
# vs
CUDA.code_llvm(outer, Tuple{Nothing})

; Function Signature: outer(Nothing)
;  @ REPL[4]:1 within `outer`
define void @julia_outer_6994() #0 {
top:
;  @ REPL[4] within `outer`
  ret void
}

vs

;  @ REPL[4]:1 within `outer`
define void @julia_outer_9973() local_unnamed_addr {
top:
  %jlcallframe1 = alloca [4 x ptr], align 8
;  @ REPL[4]:2 within `outer`
; ┌ @ REPL[3]:1 within `cudacall`
; │┌ @ iterators.jl:276 within `pairs`
; ││┌ @ essentials.jl:473 within `Pairs`
; │││┌ @ namedtuple.jl:234 within `eltype`
; ││││┌ @ namedtuple.jl:236 within `nteltype`
; │││││┌ @ tuple.jl:272 within `eltype`
; ││││││┌ @ tuple.jl:292 within `_compute_eltype`
; │││││││┌ @ promotion.jl:175 within `promote_typejoin`
          %0 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 256), align 8
          %1 = call fastcc nonnull ptr @julia_typejoin_9980(ptr readonly %0, ptr readonly inttoptr (i64 128551383252656 to ptr))
; ││││││││ @ promotion.jl:176 within `promote_typejoin`
          %2 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 64), align 8
          store ptr %2, ptr %jlcallframe1, align 8
          %3 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
          store ptr %0, ptr %3, align 8
          %4 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 2
          store ptr inttoptr (i64 128551383252656 to ptr), ptr %4, align 8
          %5 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 3
          store ptr %1, ptr %5, align 8
          %6 = call nonnull ptr @jl_f_apply_type(ptr null, ptr nonnull %jlcallframe1, i32 4)
; └└└└└└└└
;  @ REPL[4]:3 within `outer`
  ret void
}

Do you want me to reduce this to something that doesn't require CUDA.jl?

I'm also wondering if @consistent_overlay isn't a bit specific, and whether this shouldn't be a kwarg to @overlay (@overlay effects=:consistent or something). As proposed in https://github.com/JuliaLang/julia/issues/52940, I would think that consistent overlays pose some danger, i.e., what if the effects of the overlay function aren't identical? It seems tricky to guarantee that without a very good understanding of our optimizer. That's why I assumed that consistent overlays would only be needed for, e.g., GPU intrinsics, while a hypothetical executable overlay effects type indicating that a overlay method can be safely irinterpreted would be fine for overlay methods that do not call GPU-only code, or for other non-GPU overlay use cases.

Of course, the above is speculation without a good understanding of the optimizer / our effects system, so feel free to poke holes.

May 23 '24 07:05 maleadt

Using @consistent_overlay instead of @overlay doesn't seem sufficient to fix the issues in JuliaGPU/CUDA.jl#2241 (comment). Some have been fixed on 1.12 regardless of this PR, but one that remains:

To fix this exact case, we actually need is https://github.com/JuliaLang/julia/pull/54323/, and don't need this one. We can @consistent_overlay Core.throw_inexacterror since it would always throw regardless of where it's executed, but the case can be fully optimized without it if on #54323.

I'm also wondering if @consistent_overlay isn't a bit specific, and whether this shouldn't be a kwarg to @overlay (@overlay effects=:consistent or something). As proposed in #52940, I would think that consistent overlays pose some danger, i.e., what if the effects of the overlay function aren't identical? It seems tricky to guarantee that without a very good understanding of our optimizer. That's why I assumed that consistent overlays would only be needed for, e.g., GPU intrinsics, while a hypothetical executable overlay effects type indicating that a overlay method can be safely irinterpreted would be fine for overlay methods that do not call GPU-only code, or for other non-GPU overlay use cases.

If overlay options other than @overlay effects=:consistent are possible, it might be better to use kwargs, but I can't think of any extensions other than consistent_overlay, so I think it's fine to keep it as @consistent_overlay. In any case, these are under Base.Experimental, so we can relatively easily change them if necessary. It is true that @consistent_overlay requires an understanding of our compiler, but I hope the detailed explanation added in this PR will enable safe usage.

May 23 '24 11:05 aviatesk

I can't think of any extensions other than consistent_overlay

What about the executable one I suggested above, wouldn't that be a safer option for e.g. CassetteOverlay.jl (where the overlay methods are guaranteed to be executable)?

May 23 '24 11:05 maleadt

@consistent_overlay is a subset of what :executable implies, and @consistent_overlay allows the target methods to be executed at compile time if they are :foldable, but I guess I'm missing your point?

May 23 '24 12:05 aviatesk

I guess I'm mixing up effects analysis with concrete evaluation. In any case, I was suggesting a more relaxed version of the @overlay macro where the overlay method wouldn't have to be consistent, instead allowing the compiler to analyze its effects / perform concrete evaluation. That would make it possible to register overlays without having to fully understand the compiler's effects, but also without the performance issues of the current behavior (which IIUC was instated because of the GPU intrinsics that are incompatible with constprop/irinterp).

In any case, for the GPU use case I only care about :consistent, so the currently implementation is fine by me.

May 23 '24 12:05 maleadt

For me @consistent_overlay and executable are orthogonal.

consistent_overlay states: You may execute the original method instead of the overlay and you are guaranteed to obtain the same results
executable you may execute this function inside of the compiler and it will return the same as it would one the target.

E.g. we need consistent_overlay for the GPU since we can't execute the actual overlay. But for use-cases like CassetteOverlays we shouldn't be executing the original method, but rather execute the overlay method during concrete evaluation and there is no consistency requirement.

May 23 '24 14:05 vchuravy

Thank you for clarifying. That definition of :executable sounds to assume that the system for actually executing overlay methods is available for the compiler. Currently, there is no such system, and we actually need to bring something like CassetteOverlay into base. And if there is such a system for executing overlay methods fully available, I think :executable would no longer be necessary: if the overlay method has good effects, that system could be used for concrete evaluation.

May 24 '24 08:05 aviatesk

CI failures are unrelated to this PR. Going to merge.

Jun 18 '24 11:06 aviatesk

effects: add new `@consistent_overlay` macro

`:consistent_overlay`