NNlib.jl
NNlib.jl copied to clipboard
Maxpool misbehaving in some edge cases
I am using NNlibCUDA.maxpool
to calculate a sliding window maximum (I know there may be other/better ways of doing it). Unfortunately it fails catastrophically in some interesting cases. I will attach a MWE where I use a (8, 1, 1, 1) CuArray and a (5, 3) kernel, but in reality I use a (320001, 32) CuArray and a (2049, 3) kernel. I do not see the same behaviour when using NNlib and native arrays.
using CUDA
using NNlib
using NNlibCUDA
N = (8, 3, 1, 1)
K = (5, 3)
x = rand(N...)
x_c = CUDA.rand(N...)
nnlib = maxpool(x, K; pad=Tuple(k÷2 for k ∈ K), stride=(1, 1))
nnlib_cuda = maxpool(x_c, K; pad=Tuple(k÷2 for k ∈ K), stride=(1, 1))
@assert maximum(nnlib) == maximum(x)
@assert maximum(nnlib_cuda) == maximum(x_c)
From Project.toml
:
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
NNlibCUDA = "a00861dc-f156-4864-bf3c-e6376f28a68d"
Please let me know if you need any further information.
I'm not able to replicate this. Can you post the output of CUDA.versioninfo()
as well as ] st
(Pkg status)? I would also try creating a fresh environment with only CUDA, NNlib and NNlibCUDA to see if that makes a difference.
I couldn't replicate either. NNlib v0.8.14, NNlibCUDA v0.2.5
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.65.1, for CUDA 11.7
CUDA driver 11.7
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.65.1
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.4
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
2 devices:
0: NVIDIA GeForce RTX 2080 Ti (sm_75, 10.293 GiB / 11.000 GiB available)
1: NVIDIA GeForce RTX 2080 Ti (sm_75, 10.670 GiB / 11.000 GiB available)
I managed to repro with the larger sizes mentioned (output contains mostly zeros when it shouldn't). If someone can figure out what we're passing to https://github.com/JuliaGPU/CUDA.jl/blob/v3.12.1/lib/cudnn/pooling.jl and if any of those parameters look incorrect, that should help immensely with fixing this bug.
Can reproduce. The maxima don't seem to differ much, last digit & not always. But the zeros are reliably wrong at this size, but not much smaller:
julia> begin
K2 = (300, 1)
N = (300_000, 32, 1, 1)
x_c = CUDA.rand(N...)
nnlib_cuda = maxpool(x_c, K2; stride=1) # slightly simplified
maximum(nnlib_cuda) == maximum(x_c)
end
false
julia> maximum(x_c) => maximum(nnlib_cuda)
0.9999999f0 => 0.9999995f0
julia> count(iszero, x_c) => count(iszero, nnlib_cuda)
0 => 8388608
julia> device()
CuDevice(0): Tesla V100-PCIE-16GB
(@v1.10) pkg> st CUDA
Status `~/.julia/environments/v1.10/Project.toml`
[052768ef] CUDA v3.12.1