darknet icon indicating copy to clipboard operation
darknet copied to clipboard

darknet stuck at beginning of execution

Open joarezpj opened this issue 1 year ago • 2 comments

Hi,

I'm using darknet for custom object detection and YOLOv4 in a Google Colab project. Until last week the code was working fine, but now I'm getting stuck at the beginning of both training and testing execution.

!./darknet detector train data/obj.data /content/drive/MyDrive/IA/Sinterização/yolov4-obj.cfg yolov4.conv.137 -dont_show -map

obj.data file:

classes = 1
train = data/train.txt
valid = data/test.txt
names = data/obj.names
backup = backup/

The execution simply freezes in the lines shown below:

 CUDA-version: 11010 (11020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 3.2.0
 Prepare additional network for mAP calculation...
 0 : compute_capability = 800, cudnn_half = 1, GPU: A100-SXM4-40GB 
net.optimized_memory = 0 
mini_batch = 1, batch = 16, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 Create CUDA-stream - 0 

Is there something I should do different? I didn't change the code since last week when it was working fine.

Thanks in advance!

joarezpj avatar Oct 03 '22 14:10 joarezpj

@joarezpj Did you manage to fix it? I'm stuck with the same Problem, when trying to run Darknet in Docker (only in Docker, it works when running natively)

The Device I'm using is a Jetson Xavier NX Devkit. (If it's makes a difference 🤷‍♂️)

vsaw avatar Sep 13 '23 18:09 vsaw

This seems to be an issue with the cuDNN init. See https://github.com/NVIDIA/nvidia-container-toolkit/issues/124

vsaw avatar Sep 14 '23 11:09 vsaw