Adding two cuarrays of different sizes throws no error
Describe the bug
When two cuarrays of different sizes are added together, if the first array is larger than the second array, then no error is thrown and an unknown result is produced. If the second array is larger than the first, an error is thrown as it should. I am seeing this error for arrays of size Nx1.
To reproduce
Silently passes a result:
using CUDA
CuArray(rand(10,1)) + CuArray(rand(3,1))
Throws an error (as it should):
using CUDA
CuArray(rand(3,1)) + CuArray(rand(10,1))
Manifest.toml
[[deps.CUDA]] deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics"] git-tree-sha1 = "fdd9dfb67dfefd548f51000cc400bb51003de247" uuid = "052768ef-5323-5732-b1bb-66c8b64840ba" version = "5.4.3"[deps.CUDA.extensions] ChainRulesCoreExt = "ChainRulesCore" EnzymeCoreExt = "EnzymeCore" SpecialFunctionsExt = "SpecialFunctions"
[deps.CUDA.weakdeps] ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869" SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
[[deps.CUDA_Driver_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "325058b426c2b421e3d2df3d5fa646d72d2e3e7e" uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc" version = "0.9.2+0"
[[deps.CUDA_Runtime_Discovery]] deps = ["Libdl"] git-tree-sha1 = "33576c7c1b2500f8e7e6baa082e04563203b3a45" uuid = "1af6417a-86b4-443c-805f-a4643ffb695f" version = "0.3.5"
[[deps.CUDA_Runtime_jll]] deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "afea94249b821dc754a8ca6695d3daed851e1f5a" uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2" version = "0.14.1+0"
[[deps.CUDNN_jll]] deps = ["Artifacts", "CUDA_Runtime_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"] git-tree-sha1 = "cbf7d75f8c58b147bdf6acea2e5bc96cececa6d4" uuid = "62b44479-cb7b-5706-934f-f13b2eb2e645" version = "9.0.0+1"
Expected behavior
A dimension mismatch error should be thrown.
Version info
Details on Julia:
Julia Version 1.10.0 Commit 3120989f39 (2023-12-25 18:01 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: 20 × 13th Gen Intel(R) Core(TM) i9-13900H WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-15.0.7 (ORCJIT, goldmont) Threads: 24 on 20 virtual cores Environment: JULIA_EDITOR = code JULIA_VSCODE_REPL = 1 JULIA_NUM_THREADS = 14
Details on CUDA
CUDA runtime 12.5, artifact installation CUDA driver 12.5 NVIDIA driver 555.97.0
CUDA libraries:
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+555.97
Julia packages:
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.2+0
- CUDA_Runtime_jll: 0.14.1+0
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
1 device: 0: NVIDIA GeForce RTX 4070 Laptop GPU (sm_89, 3.203 GiB / 7.996 GiB available)
Additional context
I discovered this error when debugging a Flux model. My output dimension was not as expected. After digging deeper this bug is what I found.