[AMDGPU] Broken `sincos` intrinsic on GPUCompiler 0.24
On [email protected] kernel outputs:
julia> main()
y = Float32[0.84147096, 0.5403023]
But on [email protected] (same output for any 0.24.x version):
julia> main()
y = Float32[0.84147096, NaN]
ERROR: AssertionError: sum(y) == sum(sincos(1.0f0))
MWE:
using Adapt
using AMDGPU
using KernelAbstractions
import KernelAbstractions as KA
@kernel function sc1(y, x)
i = @index(Global)
@inbounds s, c = sincos(x[i])
@inbounds y[1] = s
@inbounds y[2] = c
end
function main()
kab = ROCBackend()
x = adapt(kab, ones(Float32, 1))
y = KA.allocate(kab, Float32, 2)
sc1(kab)(y, x; ndrange=1)
@show y
@assert sum(y) == sum(sincos(1f0))
return
end
Also attaching @device_code output for both versions.
In .opt.ll they start to differ at line 288:
Which Julia version? If 1.10, I'd suspect the NewPM changes (which is the only one that stands out in https://github.com/JuliaGPU/GPUCompiler.jl/compare/v0.23.0...v0.24.5).
Yes, Julia 1.10
Just to confirm it was the newpm change. And not sure how relevant this is. but even with -O0 it reproduces.
Allocopt is causing this. We turn a
%newstruct32 = call noalias nonnull dereferenceable(4) {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task31, i64 4, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140091797818128 to {}*) to {} addrspace(10)*)) #0, !dbg !187
%33 = addrspacecast {} addrspace(10)* %newstruct32 to {} addrspace(11)*, !dbg !196
%34 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %33) #1, !dbg !196
into
%3 = alloca i32, align 8, addrspace(5)
%4 = bitcast i32 addrspace(5)* %3 to i8 addrspace(5)*
%5 = bitcast i8 addrspace(5)* %4 to {} addrspace(5)*
%newstruct32 = addrspacecast {} addrspace(5)* %5 to {} addrspace(10)*
...
call void @llvm.lifetime.start.p5i8(i64 4, i8 addrspace(5)* %4)
%36 = addrspacecast {} addrspace(10)* %newstruct32 to {} addrspace(11)*, !dbg !196
%37 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %36) #1, !dbg !196
Those addresspaces don't seem very valid.
This seems fine on 1.10.5. I think this got fixed in https://github.com/JuliaLang/julia/commit/af9a7af3b27c0ff22179a7dcd30bf6753d3d575f
That commit isn't in 1.10 though, so probably something in GPUCompiler changed too?
It is in 1.10 but I think it got a manual backport https://github.com/JuliaLang/julia/commit/99c4ae46a610a62699a6a59f66d21723734b911f
Oh, great, thanks for confirming.