AMDGPU.jl
AMDGPU.jl copied to clipboard
at-roc: Add boundscheck flag
Similar to Julia's --check-bounds flag, this flag (used like @roc boundscheck=false ...) allows the user to entirely disable boundschecking within their code.
Testing on MI250x, @roc boundscheck=false seems not to work.
Testing following kernel:
function diff2D_step!(T2, T, Ci, lam, dt, _dx, _dy)
ix = (workgroupIdx().x - 1) * workgroupDim().x + workitemIdx().x
iy = (workgroupIdx().y - 1) * workgroupDim().y + workitemIdx().y
if (ix>1 && ix<size(T2,1) && iy>1 && iy<size(T2,2))
@inbounds T2[ix,iy] = T[ix,iy] + dt*(Ci[ix,iy]*(
- ((-lam*(T[ix+1,iy] - T[ix,iy])*_dx) - (-lam*(T[ix,iy] - T[ix-1,iy])*_dx))*_dx
- ((-lam*(T[ix,iy+1] - T[ix,iy])*_dy) - (-lam*(T[ix,iy] - T[ix,iy-1])*_dy))*_dy ))
end
return
end
- using
@inbounds: T_tot diff2D = 1182.223 GB/s - disabling
@inbounds: T_tot diff2D = 635.3999 GB/s - disabling
@inboundsbut setting@roc boundscheck=false: T_tot diff2D = 635.7766 GB/s
Here is a reproducer https://gist.github.com/luraess/5e697f857a7aa4d1d00e99ca02cbbb3d (@jpsamaroo)
Running it on LUMI MI250x produces:
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.0-beta4 (2023-02-07)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> include("amd_bench_inbounds.jl")
Process selecting device 1
Problem size: nx=24575, ny=24575, nz=1, Float64
ROCm grid=(24576, 24576), threads=(128, 2, 1)
T_tot Lap2D inbounds = 1173.402 GB/s
T_tot Lap2D no inbounds = 638.4716 GB/s
T_tot Lap2D boundscheck=false = 639.0676 GB/s
julia>