CUDA.jl Audit uses of 32-bit indexing

We're currently using Int32 indices in some kernels, using the i32 hack, because that often results in significantly better performance. However, GPUs are getting large, and users are starting to use arrays that overflow typemax(Int32) elements. This can results in bugs like https://github.com/JuliaGPU/CUDA.jl/issues/1963

We should be more careful about using 32-bit indexing, and probably not use i32 until we have a better way of deciding which index type to use. Maybe we can add some kind of index_type trait, defaulting to Int but possibly using Int32 when the input arrays allow it, e.g., using https://github.com/JuliaGPU/CUDA.jl/pull/1895.

Jun 16 '23 08:06 maleadt

Dear CUDA.jl team, I would like to bump this issue. The last couple generations of GPUs (e.g. L40S, H100 and H200) have enough memory that they can handle >2B arrays.

Error 1 (broadcasting)

julia> using CUDA
julia> A = CUDA.fill(1f0, 2^32); A .= 2f0
ERROR: InexactError: trunc(Int32, 4294967296)
Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Vararg{Any})
    @ Core ./boot.jl:750
  [2] checked_trunc_sint
    @ ./boot.jl:764 [inlined]
  [3] toInt32
    @ ./boot.jl:801 [inlined]
  [4] Int32
    @ ./boot.jl:891 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:222 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:5139 [inlined]
  [9] #735
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:35 [inlined]
 [10] check
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:35 [inlined]
 [11] cuOccupancyMaxPotentialBlockSize
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:34 [inlined]
 [12] launch_configuration(fun::CuFunction; shmem::Int64, max_threads::Int64)
    @ CUDA ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:61
 [13] launch_configuration
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:56 [inlined]
 [14] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:107

Error 2 (filling a large array, no explicit broadcasting)

julia> A = CUDA.fill(true, 2^32);
ERROR: InexactError: trunc(Int32, 4294967296)
Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Vararg{Any})
    @ Core ./boot.jl:750
  [2] checked_trunc_sint
    @ ./boot.jl:764 [inlined]
  [3] toInt32
    @ ./boot.jl:801 [inlined]
  [4] Int32
    @ ./boot.jl:891 [inlined]
  [5] convert
    @ ./number.jl:7 [inlined]
  [6] cconvert
    @ ./essentials.jl:687 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:222 [inlined]
  [8] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:5139 [inlined]
  [9] #735
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:35 [inlined]
 [10] check
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/libcuda.jl:35 [inlined]
 [11] cuOccupancyMaxPotentialBlockSize
    @ ~/.julia/packages/CUDA/1kIOw/lib/utils/call.jl:34 [inlined]
 [12] launch_configuration(fun::CuFunction; shmem::Int64, max_threads::Int64)
    @ CUDA ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:61
 [13] launch_configuration
    @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/occupancy.jl:56 [inlined]
 [14] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:107
 [15] fill!(A::CuArray{Bool, 1, CUDA.DeviceMemory}, x::Bool)
    @ GPUArrays ~/.julia/packages/GPUArrays/uiVyU/src/host/construction.jl:22
 [16] fill
    @ ~/.julia/packages/CUDA/1kIOw/src/array.jl:777 [inlined]
 [17] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/utilities.jl:35 [inlined]
 [18] macro expansion
    @ ~/.julia/packages/CUDA/1kIOw/src/memory.jl:831 [inlined]
 [19] top-level scope
    @ ./REPL[114]:1
Some type information was truncated. Use `show(err)` to see complete types.

EDIT: I believe this was fixed a couple of days ago, I'll wait for the next release and re-run my code.

Feb 22 '25 01:02 mtanneau

As you noted, those issues are unrelated, and are fixed on the master branch.

Mar 03 '25 10:03 maleadt