GPUifyLoops.jl icon indicating copy to clipboard operation
GPUifyLoops.jl copied to clipboard

Example showing 3D stencil calculations with a divergence

Open ali-ramadhan opened this issue 6 years ago • 2 comments

Example might be useful to others. I get a speedup of ~130x with this small kernel (compared to single CPU core, so it's not really a fair comparison) so I think I did it right.

Resolves #12

ali-ramadhan avatar Feb 19 '19 00:02 ali-ramadhan

    gpuIndex3D() = CartesianIndex(
        blockIdx().z,
        blockIdx().y - 1) * blockDim().y + threadIdx().y,
        blockIdx().x - 1) * blockDim().x + threadIdx().x
                                                        )

    # Calculate the divergence of f at every point and store it in div_f.
    @loop for I in (eachindex(f); gpuIndex3D())

                @inbounds div_f[I] = div(f, Nx, Ny, Nz, Δx, Δy, Δz, I)
    end

vchuravy avatar Feb 19 '19 14:02 vchuravy

Something like this for index calc

        maxThreads = 1024
        Nx, Ny, Nz = size(f)
        Tx  = min(maxThreads, Nx)
        Ty  = min(fld(maxThreads, Tx), Ny)
        Tz  = min(fld(maxThreads, (Tx*Ty)), Nz)

        Bx, By, Bz = cld(Nx, Tx), cld(Ny, Ty), cld(Nz, Tz)  # Blocks in grid.

vchuravy avatar Feb 19 '19 14:02 vchuravy