KernelAbstractions.jl icon indicating copy to clipboard operation
KernelAbstractions.jl copied to clipboard

localmem does not work for cpu backend?

Open ww1g11 opened this issue 1 year ago • 1 comments

Hi, I am trying to write a matvec kernel with shared memory, it works for CUDA backend. However, it results in an error when switched to CPU backend:

ERROR: LoadError: UndefVarError: `j` not defined in `Main`
Stacktrace:
 [1] cpu_matvec_kernel!

How to fix it? many thanks.

The julia script is shown below:

using KernelAbstractions
using CUDA
using Test

@kernel function matvec_kernel!(output, @Const(A), @Const(b))
    I = @index(Global, Linear)
    I = div(I-1, 32) + 1
    idx = @index(Local, Linear)
    i = (idx - 1) % 32 + 1  #local index within the wrap

    cache_size = @uniform @groupsize()
    cache = @localmem eltype(output) cache_size

    N = size(A, 2)
    sum = zero(eltype(output))
    @inbounds begin
        for J = i:32:N
            sum += A[I, J] * b[J]
        end
        cache[idx] = sum
    end
    @synchronize

    j::Int = 16
    while j > 0
        if i <= j
            @inbounds cache[idx] += cache[idx + j]  # can not find j for cpu backend
        end
        @synchronize
        j = j ÷ 2
    end

    if i == 1
        @inbounds output[I] = cache[idx]
    end
    
end

function matvec!(output, A, b)
    backend = KernelAbstractions.get_backend(A)
    kernel! = matvec_kernel!(backend, 256)
    kernel!(output, A, b; ndrange=32*size(A, 1))
end


m, n = 2^10, 2^10
A = CUDA.rand(Float32, m, n)
b = CUDA.rand(Float32, n)
output = CUDA.rand(Float32, m)

matvec!(output, A, b)
@test isapprox(output, A * b)

matvec!(Array(output), Array(A), Array(b))

The versioninfo() gives:

julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12a (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 24 × 13th Gen Intel(R) Core(TM) i7-13700F
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.6
NVIDIA driver 561.3.0

CUDA libraries:
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+561.3

Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA GeForce RTX 3060 Ti (sm_86, 5.048 GiB / 8.000 GiB available)

ww1g11 avatar Nov 15 '24 14:11 ww1g11

This is https://github.com/JuliaGPU/KernelAbstractions.jl/issues/262 @synchronize does not work within while loops.

You can use OpenCL.jl + POCL_jll to execute this code on the CPU, which we are working towards making the default for KA to fix bugs like these.

vchuravy avatar Nov 30 '24 21:11 vchuravy