KernelAbstractions.jl Slow simple 2D copy kernel with Metal backend

Slow simple 2D copy kernel with Metal backend

Open LaurentPlagne opened this issue 1 year ago • 2 comments

Hi,

I try to use KA for the first time and I wonder about the performance I obtain for a simple kernel copying 2 2D matrices of Float32 (I know that I could copy them as vectors) :

using Metal
using KernelAbstractions
using Random
using BenchmarkTools

@kernel function copy2D_kernel!(b, a)
    i, j = @index(Global, NTuple)
    @inbounds b[i, j] = a[i, j]
end

function copy2D!(b, a)
    backend = get_backend(a)
    groupsize = KernelAbstractions.isgpu(backend) ? 256 : 1024
    kernel! = copy2D_kernel!(backend, groupsize)
    kernel!(b, a, ndrange=size(a))
end

function go()

    res = 2^14
    # creating initial cpu arrays
    a_cpu = rand(Float32, res, res)
    b_cpu = zeros(Float32, res, res)
    @info("size of a,b (GB) :",2sizeof(a_cpu)/(1.e9))

    # creating initial gpu arrays
    a = MtlArray(a_cpu)
    b = MtlArray(b_cpu)

    backend = get_backend(a)
    gpu_elapsed = @belapsed begin
        copy2D!($b,$a)
        KernelAbstractions.synchronize($backend)
    end

    cpu_elapsed = @belapsed $a_cpu .= $b_cpu

    bandwidth_GBs(res,t,T) = sizeof(T)*res*res*2/(t*1.e9) 
    @info(cpu_elapsed,bandwidth_GBs(res,cpu_elapsed,Float32))
    @info(gpu_elapsed,bandwidth_GBs(res,gpu_elapsed,Float32))

    nothing
end

And I obtain (mbp M1Max) a cpu simple copy twice as fast at the KA GPU one...

┌ Info: size of a,b (GB) : └ (2 * sizeof(a_cpu)) / 1.0e9 = 2.147483648 ┌ Info: 0.022282291 └ bandwidth_GBs(res, cpu_elapsed, Float32) = 96.37625000050488 ┌ Info: 0.047214875 └ bandwidth_GBs(res, gpu_elapsed, Float32) = 45.48320096156137

Any hint ?

Laurent

Feb 27 '24 12:02 LaurentPlagne

how do your benchmarks vary with groupsize and res? are there regions in that space for which the GPU is faster??

Jun 07 '24 13:06 bjarthur

It looks rather stable for res in {2^15,2^16} and groupsize in {126,256,512,1024}

Jun 08 '24 21:06 LaurentPlagne

KernelAbstractions.jl KernelAbstractions.jl copied to clipboard

Slow simple 2D copy kernel with Metal backend

KernelAbstractions.jl
KernelAbstractions.jl copied to clipboard