effects: add new `@consistent_overlay` macro
This PR serves to replace #51080 and close #52940.
It extends the :nonoverlayed to UInt8 and introduces the CONSISTENT_OVERLAY effect bit, allowing for concrete evaluation of overlay methods using the original non-overlayed counterparts when applied. Additionally, this PR adds a new effect override called :consistent_overlay.
I've also included a relatively accurate description of :consistent_overlay, as pointed out in #51080. Quoting from the newly added docstrings:
:consistent_overlayThe
:consistent_overlaysetting asserts that any overlayed methods potentially called by the method are:consistentwith their original, non-overlayed counterparts. For the exact definition of:consistent, refer to the earlier explanation.More formally, when evaluating a generic function call
f(x)at a specific world-agei, and the regular method callfᵢ(x)is redirected to an overlay methodfᵢ′(x), this setting requires thatfᵢ(x) ≡ fᵢ′(x).!!! note Note that the requirements for
:consistent-cy include not only that the return values are egal, but also that the manner of termination is the same. However, it's important to aware that when they throw exceptions, the exceptions themselves don't necessarily have to be egal as explained in the note of:consistent. In other words, iffᵢ(x)throws an exception, this settings requiresfᵢ′(x)to also raise one, but the exact exceptions may differ.!!! note This setting isn't supported at the callsite; it has to be applied at the definition site. Also, given its nature, it's expected to be used together with
Base.Experimental.@overlay.
Still, the explanation might not be comprehensive enough. I welcome any feedback.
EDIT: Now this feature is implemented with a new macro @consistent_overlay instead of extending the override set of @assume_effects.
could we also add a weaker version of this effect that only requires the overlay to return an egal answer when the regular function returns an answer? (e.g. for things like NaNMath)
That sounds exactly like the requirements for :consistent_overlay. Can you explain how that version is considered weaker?
If I'm reading this right, it wouldn't be valid to use this extension for log_overlay(x) = x < 0 ? NaN : log(x) since log_overlay returns a value rather than throwing for negative numbers.
Ah, did you mean @overlay NAN_MATH_TABLE log(x) = x < 0 ? NaN : log(x), right?
We can create such a :nonoverlay effect bit (, which allows concrete evaluation with bailing out of it if the original method's execution raises an issue). But that nuanced setting would be more complicated than the already complex :consistent_overlay, so I'd prefer to introduce it only when there's a demonstrated need.
I'm ok with this, but it seems a bit weird to put this on @assume_effects. I don't even mind this going into in the (internal) EffectsOverride, I'm just thinking in terms of where to put the user-facing part of this. I think it makes more sense as an option to the @overlay macro.
I think it makes more sense as an option to the
@overlaymacro.
I agree. I've created a new @consistent_overlay macro and included a thorough documentation.
Sorry for the slow response, upgrading the GPU stack to LLVM 17 took longer than expected.
Using @consistent_overlay instead of @overlay doesn't seem sufficient to fix the issues in https://github.com/JuliaGPU/CUDA.jl/issues/2241#issue-2086184007. Some have been fixed on 1.12 regardless of this PR, but one that remains:
using CUDA
cudacall(f, types::Type, args...; kwargs...) = nothing
function outer(f)
@inline cudacall(f, Tuple{}; stream=Ref(42), shmem=1)
return
end
using InteractiveUtils
InteractiveUtils.code_llvm(outer, Tuple{Nothing})
# vs
CUDA.code_llvm(outer, Tuple{Nothing})
; Function Signature: outer(Nothing)
; @ REPL[4]:1 within `outer`
define void @julia_outer_6994() #0 {
top:
; @ REPL[4] within `outer`
ret void
}
vs
; @ REPL[4]:1 within `outer`
define void @julia_outer_9973() local_unnamed_addr {
top:
%jlcallframe1 = alloca [4 x ptr], align 8
; @ REPL[4]:2 within `outer`
; ┌ @ REPL[3]:1 within `cudacall`
; │┌ @ iterators.jl:276 within `pairs`
; ││┌ @ essentials.jl:473 within `Pairs`
; │││┌ @ namedtuple.jl:234 within `eltype`
; ││││┌ @ namedtuple.jl:236 within `nteltype`
; │││││┌ @ tuple.jl:272 within `eltype`
; ││││││┌ @ tuple.jl:292 within `_compute_eltype`
; │││││││┌ @ promotion.jl:175 within `promote_typejoin`
%0 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 256), align 8
%1 = call fastcc nonnull ptr @julia_typejoin_9980(ptr readonly %0, ptr readonly inttoptr (i64 128551383252656 to ptr))
; ││││││││ @ promotion.jl:176 within `promote_typejoin`
%2 = load ptr, ptr getelementptr inbounds (i8, ptr @jl_small_typeof, i64 64), align 8
store ptr %2, ptr %jlcallframe1, align 8
%3 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 1
store ptr %0, ptr %3, align 8
%4 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 2
store ptr inttoptr (i64 128551383252656 to ptr), ptr %4, align 8
%5 = getelementptr inbounds ptr, ptr %jlcallframe1, i64 3
store ptr %1, ptr %5, align 8
%6 = call nonnull ptr @jl_f_apply_type(ptr null, ptr nonnull %jlcallframe1, i32 4)
; └└└└└└└└
; @ REPL[4]:3 within `outer`
ret void
}
Do you want me to reduce this to something that doesn't require CUDA.jl?
I'm also wondering if @consistent_overlay isn't a bit specific, and whether this shouldn't be a kwarg to @overlay (@overlay effects=:consistent or something). As proposed in https://github.com/JuliaLang/julia/issues/52940, I would think that consistent overlays pose some danger, i.e., what if the effects of the overlay function aren't identical? It seems tricky to guarantee that without a very good understanding of our optimizer. That's why I assumed that consistent overlays would only be needed for, e.g., GPU intrinsics, while a hypothetical executable overlay effects type indicating that a overlay method can be safely irinterpreted would be fine for overlay methods that do not call GPU-only code, or for other non-GPU overlay use cases.
Of course, the above is speculation without a good understanding of the optimizer / our effects system, so feel free to poke holes.
Using
@consistent_overlayinstead of@overlaydoesn't seem sufficient to fix the issues in JuliaGPU/CUDA.jl#2241 (comment). Some have been fixed on 1.12 regardless of this PR, but one that remains:
To fix this exact case, we actually need is https://github.com/JuliaLang/julia/pull/54323/, and don't need this one. We can @consistent_overlay Core.throw_inexacterror since it would always throw regardless of where it's executed, but the case can be fully optimized without it if on #54323.
I'm also wondering if
@consistent_overlayisn't a bit specific, and whether this shouldn't be a kwarg to@overlay(@overlay effects=:consistentor something). As proposed in #52940, I would think thatconsistentoverlays pose some danger, i.e., what if the effects of the overlay function aren't identical? It seems tricky to guarantee that without a very good understanding of our optimizer. That's why I assumed thatconsistentoverlays would only be needed for, e.g., GPU intrinsics, while a hypotheticalexecutableoverlay effects type indicating that a overlay method can be safelyirinterpreted would be fine for overlay methods that do not call GPU-only code, or for other non-GPU overlay use cases.
If overlay options other than @overlay effects=:consistent are possible, it might be better to use kwargs, but I can't think of any extensions other than consistent_overlay, so I think it's fine to keep it as @consistent_overlay. In any case, these are under Base.Experimental, so we can relatively easily change them if necessary.
It is true that @consistent_overlay requires an understanding of our compiler, but I hope the detailed explanation added in this PR will enable safe usage.
I can't think of any extensions other than
consistent_overlay
What about the executable one I suggested above, wouldn't that be a safer option for e.g. CassetteOverlay.jl (where the overlay methods are guaranteed to be executable)?
@consistent_overlay is a subset of what :executable implies, and @consistent_overlay allows the target methods to be executed at compile time if they are :foldable, but I guess I'm missing your point?
I guess I'm mixing up effects analysis with concrete evaluation. In any case, I was suggesting a more relaxed version of the @overlay macro where the overlay method wouldn't have to be consistent, instead allowing the compiler to analyze its effects / perform concrete evaluation. That would make it possible to register overlays without having to fully understand the compiler's effects, but also without the performance issues of the current behavior (which IIUC was instated because of the GPU intrinsics that are incompatible with constprop/irinterp).
In any case, for the GPU use case I only care about :consistent, so the currently implementation is fine by me.
For me @consistent_overlay and executable are orthogonal.
consistent_overlaystates: You may execute the original method instead of the overlay and you are guaranteed to obtain the same resultsexecutableyou may execute this function inside of the compiler and it will return the same as it would one the target.
E.g. we need consistent_overlay for the GPU since we can't execute the actual overlay.
But for use-cases like CassetteOverlays we shouldn't be executing the original method, but rather execute the overlay method during concrete evaluation and there is no consistency requirement.
Thank you for clarifying. That definition of :executable sounds to assume that the system for actually executing overlay methods is available for the compiler. Currently, there is no such system, and we actually need to bring something like CassetteOverlay into base. And if there is such a system for executing overlay methods fully available, I think :executable would no longer be necessary: if the overlay method has good effects, that system could be used for concrete evaluation.
CI failures are unrelated to this PR. Going to merge.