darknet
darknet copied to clipboard
cuDNN Error: CUDNN_STATUS_BAD_PARAM while training
While training in moment that mAP calculation should start I have assertion error:
(next mAP calculation at 1900 iterations) 1900: 185.299484, 158.218445 avg loss, 0.001000 rate, 6.550167 seconds, 121600 images, 16.111683 hours left 4cuDNN Error: CUDNN_STATUS_BAD_PARAM: File exists darknet: ./src/utils.c:331: error: Assertion 0 failed.
I have the same issue. I noticed it happens when I have 2 classes. Right now I am training without the -map flag and it is training fine. It is just not validating against your 20% images in your test.txt I don't know if it is linked to darknet or Cudnn.
EDIT: Didn't notice it's pjreedie repo. Opening issue on AlexeyAB fork got the same issue with 1080 ti, built with ZED 3.2, opencv 4.5 with CUDA. Training only for 1 class. Driver Version: 455.32.00, CUDA Version: 11.1, cuDNN 8.0.4 Issue didn't happend on cuda 10.1, ubuntu 18 on same setup without zed /darknet$ git log commit 14b196d4f2f73fb2f6f8c4019de9a0de00c5a27e
Crash on map: (next mAP calculation at 1000 iterations) 1000: 0.831169, 0.674900 avg loss, 0.001000 rate, 4.804868 seconds, 64000 images, 3.878937 hours left Resizing to initial size: 416 x 416 try to allocate additional workspace_size = 89.65 MB CUDA allocate done!
calculation mAP (mean average precision)... Detection layer: 139 - type = 28 Detection layer: 150 - type = 28 Detection layer: 161 - type = 28 4 cuDNN status Error in: file: /home/patryk/darknet/src/convolutional_kernels.cu : () : line: 533 : build time: Nov 7 2020 - 09:24:25
cuDNN Error: CUDNN_STATUS_BAD_PARAM cuDNN Error: CUDNN_STATUS_BAD_PARAM: Resource temporarily unavailable ./szkolenie.sh: line 2: 36330 Segmentation fault (core dumped) ./darknet detector train dataset/obj.data dataset/op14.cfg yolov4.conv.137 -map
Was this ever understood and/or addressed?
command: darknet detector train yolov3-ambulance-setup.data yolov3-ambulance-train.cfg ./darknet53.conv.74 -dont_show -map 2> train_log.txt
GPU: Nvidia 2028 SUPER, OPENCV 4.5.5 with CUDA, training for 1 class Driver version 512.15 CUDA version: 11.6 cuDNN version 8.3
(next mAP calculation at 100 iterations) 100: 0.637962, 1.110619 avg loss, 0.001000 rate, 4.220000 seconds, 6400 images, 1.343723 hours left Resizing to initial size: 416 x 416 try to allocate additional workspace_size = 154.24 MB CUDA allocate done!
calculation mAP (mean average precision)... Detection layer: 82 - type = 28 Detection layer: 94 - type = 28 Detection layer: 106 - type = 28
cuDNN status Error in: file: C:\darknet\src\convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 555 : build time: Apr 8 2022 - 13:14:27
cuDNN Error: CUDNN_STATUS_BAD_PARAM
@Remco-Terwal-Bose see mentioned issue above or here
This is my own error message, just when it's about to calculate the mAP, everything crashes: (next mAP calculation at 1000 iterations) H1000/10000: loss=14.6 hours left=4.9 1000: 14.551660, 16.706905 avg loss, 0.002610 rate, 1.315968 seconds, 64000 images, 4.874600 hours left 4Darknet error location: ./src/convolutional_kernels.cu, forward_convolutional_layer_gpu(), line #541 cuDNN Error: CUDNN_STATUS_BAD_PARAM: Succes
I also encountered the error when use -map in command, "cuDNN status Error in: file: ./src/convolutional_kernels.cu function: forward_convolutional_layer_gpu() line: 541 cuDNN Error: CUDNN_STATUS_BAD_PARAM Darknet error location: ./src/convolutional_kernels.cu, forward_convolutional_layer_gpu(), line #541"
There are following solutions I found :
- Downgrade CUDA link reply7153
- use subdivisions=64 link
- other comment
- export CUDA_VISIBLE_DEVICES=0 link
- GPU architecture link
But still, I was able to run with the following: GPU=1 CUDNN=0 CUDNN_HALF=0 OPENCV=1
I don't know, Is it a correct way to do so?
@AlexeyAB