ai-research-code
ai-research-code copied to clipboard
【NVC-Net】RuntimeError: target_specific error in backward_impl. Failed `status == CUDNN_STATUS_SUCCESS`: UNKNOWN
Hi, I try to train NVC-Net on single gpu, but I meet some errors as follows:
value error in query
/home/gitlab-runner/builds/jmdP2aBr/1/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:69
Failed it != items_.end()
: Any of [cudnn:float, cuda:float, cpu:float] could not be found in []
No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2022-02-15 17:16:13,887 [nnabla][INFO]: Training data with 100 speakers.
2022-02-15 17:16:13,888 [nnabla][INFO]: DataSource with shuffle(True)
2022-02-15 17:16:13,934 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
Error during backward propagation:
Add2CudaCudnn
Add2CudaCudnn
Add2CudaCudnn
MulScalarCuda
MeanCudaCudnn
SquaredErrorCuda
Div2Cuda
PowScalarCuda
SumCuda
AddScalarCuda
PowScalarCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
GELUCuda
Add2CudaCudnn
ConvolutionCudaCudnn
Mul2Cuda
TanhCudaCudnn <-- ERROR
Traceback (most recent call last):
File "main.py", line 99, in status == CUDNN_STATUS_SUCCESS
: UNKNOWN
I had followed the install page: https://nnabla.org/install/, but it does not work. Could you please give some suggestion? My environments as follows: CUDA11.0, cudnn 8.1.0, python 3.6.8
Thank you ! Look forward to your kind reply.
Thank you for checking. Forward propagation seems to be working, so I hope there is no problem installing nnabla... If you can use docker, could you please try to run with docker?
cd nvcnet
./scripts/docker_build.sh
docker run --gpus all -u $(id -u):$(id -g) -v $HOME:$HOME -w $(pwd) --rm -it nvcnet:latest /bin/bash
export NUMBA_CACHE_DIR=/tmp
python main.py -c cudnn -d 0
If this is same error, could you please provide GPU information by nvidia-smi -L
?