KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard
Variable scoping issue leading to unexpected UndefVarError on CPU
I have encountered what I think is a variable scoping issue that causes one of my KernelAbstractions kernels to fail when executing on the CPU. (GPU execution is fine.) I'm using KernelAbstractions v0.9.6 in Julia 1.9.2. Here's a minimal example that triggers the problem:
using KernelAbstractions
@kernel function mykernel(x)
i = @index(Global, Linear)
_, Nblocks = @ndrange()
@inbounds begin
id = Nblocks
@synchronize
x[i] = 1.0
end
end
x = ones(256, 1)
backend = get_backend(x)
kernel! = mykernel(backend, (256,))
kernel!(x, ndrange = (256, 1))
When I run this code, it fails with:
ERROR: LoadError: UndefVarError: `Nblocks` not defined
Stacktrace:
[1] cpu_mykernel
@ ~/.julia/packages/KernelAbstractions/lhhMo/src/macros.jl:276 [inlined]
[2] cpu_mykernel(__ctx__::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, x::Matrix{Float64})
@ Main ./none:0
[3] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:115
[4] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck, static_threads::Bool)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:82
[5] (::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)})(args::Matrix{Float64}; ndrange::Tuple{Int64, Int64}, workgroupsize::Nothing)
@ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:44
[6] top-level scope
@ ~/debug.jl:19
[7] include(fname::String)
@ Base.MainInclude ./client.jl:478
[8] top-level scope
@ REPL[1]:1
The compiler thinks that the variable Nblocks in the id = Nblocks line is not defined, even though it clearly is defined via the call to @ndrange. When I inspect the generated kernel code with code_lowered(), I see:
julia> code_lowered(kernel!.f)
1-element Vector{Core.CodeInfo}:
CodeInfo(
[...]
5 ── i@_14 = KernelAbstractions.__index_Global_Linear(__ctx__, I#301)
│ %25 = (KernelAbstractions.ndrange)(__ctx__)
│ %26 = (size)(%25)
│ %27 = Base.indexed_iterate(%26, 1)
│ Core.getfield(%27, 1)
│ @_11 = Core.getfield(%27, 2)
│ %30 = Base.indexed_iterate(%26, 2, @_11)
└─── Nblocks = Core.getfield(%30, 1)
[...]
12 ─ i@_17 = KernelAbstractions.__index_Global_Linear(__ctx__, I#303)
└─── id = Main.Nblocks
[...]
)
The code in block 5 shows that Nblocks is getting set OK, but the code in block 12 shows that when the id = Nblocks line gets translated, the compiler looks for a definition of Nblocks in the Main module, where it does not exist. (I redacted this listing for readability. I'm happy to provide the full listing if that would be helpful.)
The issue disappears if I remove the call to @synchronize.
Any thoughts here?
EDIT: This is probably related to (maybe even a duplicate of) #274. Also, another way I can get the issue to disappear is to move the call to @ndrange that defines Nblocks inside the @inbounds begin ... end block.
Yeah this is expected and the reason why the @uniform macro is needed.
https://juliagpu.github.io/KernelAbstractions.jl/api/#KernelAbstractions.@uniform
OK, yeah---using @uniform fixes it. I guess what confuses me here is that I didn't need to do that for the variables declared with @index. (Indeed, putting indices inside a @uniform block triggers a different error.) But I can work with that---thanks!
Yeah the CPU lowering is a bit tricky, and doesn't have the best errors
Hey, I need help here please!
No matter if I add or remove @uniform or @private in the index, I can't run the code:
LoadError: UndefVarError: index not defined:
index = @index(Global)
@uniform tid = index - 1
LoadError: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}):
index = @uniform @index(Global)
@uniform tid = index - 1
ERROR: LoadError: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}):
@uniform index = @index(Global)
@uniform tid = index - 1
Any ideas? Thanks!
@ManuelCostanzo please make it easier to help you by formatting your post.
I am unsure what you want to achieve? By definition @uniform and @index are incompatible.
Hi @vchuravy, I just need to access to the thread_id and block_id values after the @synchronize. In CPU, the only way (as far as I know) is to create the variables with @uniform or @private. So, It's impossible to me to do that because I'm getting the error "ID IS NOT DEFINED". If I don't add @uniform, I get "VARR IS NOT DEFINED" Here is an example code:
using KernelAbstractions, CUDA
const backend = CPU()
const BLOCK_SIZE = 4
@kernel function kk(A, B, C)
id = @index(Global) # I tried using @uniform and @private
@uniform varr = id # I tried using @private too
for i in 1:varr
for j in 1:varr
@synchronize()
C[i, j] = 0
for k in 1:varr
C[i, j] += A[i, k] * B[k, j]
end
end
end
end
function run_gpu()
m = 10
n = 20
#Inicializo las matrices en la GPU
A = KernelAbstractions.zeros(backend, Int, m, n)
B = KernelAbstractions.zeros(backend, Int, m, n)
C = KernelAbstractions.zeros(backend, Int, m, n)
#Calculo el tamaño de bloque
block_size = BLOCK_SIZE
mn = max(m, n)
if mn < BLOCK_SIZE
block_size = mn
end
#Calculo el número de bloques
total_blocks = (mn + block_size - 1) ÷ block_size
#Anti-diagonal loop
@time @inbounds for diag in 0:(2*total_blocks-1)
#Número de bloques a lanzar en la anti-diagonal
num_blocks_diagonal = min(diag + 1, 2 * total_blocks - diag - 1)
kernel! = kk(backend)
kernel!(A, B, C, ndrange = (block_size * block_size, num_blocks_diagonal), workgroupsize = block_size * block_size)
KernelAbstractions.synchronize(backend)
end
end
run_gpu()