YOLOV4_MCMOT icon indicating copy to clipboard operation
YOLOV4_MCMOT copied to clipboard

CUDA error: device-side assert triggered

Open starsky68 opened this issue 4 years ago • 1 comments

Starting training for 100 epochs...

 Epoch   gpu_mem      GIoU       obj       cls      reid     total   targets  img_size

0%| | 0/1 [00:00<?, ?it/s] /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "train.py", line 660, in train() # train normally File "train.py", line 452, in train loss, loss_items = compute_loss_no_upsample(pred, reid_feat_out, targets, track_ids, model) File "/home/11/YOLOV4_MCMOT-master/utils/utils.py", line 522, in compute_loss_no_upsample return loss, torch.cat((l_box, l_obj, l_cls, l_reid, loss)).detach() File "/home/11/.local/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/amp/wrap.py", line 81, in wrapper return orig_fn(seq, *args, **kwargs) RuntimeError: CUDA error: device-side assert triggered

训练的时候出现上面的问题,我的标签格式如下: 0 1 0.01 0.02 0.03 0.04 1 1 0.014 0.015 0.03 0.016 3 1 0.05 0.06 0.07 0.08 4 1 0.017 0.018 0.019 0.020 2 1 0.09 0.1 0.12 0.13 2 2 0.021 0.022 0.023 0.024

一共有5个类别,怎么会出现越界呢?似乎类别数减去1才可以,这是因为有一个背景类? @CaptainEven 请求帮助,感激不尽。

starsky68 avatar Nov 07 '20 03:11 starsky68

Hello, did you solve this issue?

kenrickfernandes avatar Oct 27 '21 13:10 kenrickfernandes