pytorch-0.4-yolov3 icon indicating copy to clipboard operation
pytorch-0.4-yolov3 copied to clipboard

Problem with training YOLOv2

Open mrkieumy opened this issue 6 years ago • 6 comments

Hi @andy-yun I trained normally with yoloV3, tinyV3, tinyV2. But with YoloV2 model, it raises this error:

Traceback (most recent call last): File "train.py", line 626, in main() File "train.py", line 202, in main nsamples = train(epoch) File "train.py", line 307, in train ol=l(output[i]['x'], target) File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/kieumy/YOLO/pytorch_conditioning/region_layer.py", line 172, in forward loss_cls = self.class_scale * nn.CrossEntropyLoss(reduction='sum')(cls, tcls)/nB File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 904, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1790, in nll_loss ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: invalid argument 2: non-empty vector or matrix expected at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:32

Do you know the reason why? Thanks.

mrkieumy avatar Apr 12 '19 13:04 mrkieumy

@mrkieumy you should check the tcls value (tcls is target). I am guessing that tcls is empty. It means that there is not assigned target value.

andy-yun avatar Apr 13 '19 09:04 andy-yun

@andy-yun, Thanks, you mean tcls in yolo_layer.py or in region_layer? I'm sorry I don't understand exactly those tcls value. Is there the difference between V2 and V3,tinyV3,tinyV2 to make us fix the tcls? I think if it wrong with yolov2, it must be wrong with tinyv2 too. But the code runs normally with tinyv2. Do you know exactly why? Thanks.

mrkieumy avatar Apr 15 '19 16:04 mrkieumy

@mrkieumy tcls means class information from ground truth value. if you adopt yolov2, then tcls at region_layer.py is used, else tcls of yolo_layer.py is used. tcls at region_layer.py is compared at CrossEntropyLoss, tcls at yolo_layer.py is compared at BCELoss.

andy-yun avatar Apr 22 '19 11:04 andy-yun

Hi, I suffer from the same error when training on VOC dataset, and i think tcls is not empty. I need help. Thanks.

PurpleMStone avatar Apr 24 '19 08:04 PurpleMStone

@mrkieumy @PurpleMStone Both you should check config and data files. The codes are well worked on coco and voc dataset.

andy-yun avatar Apr 25 '19 05:04 andy-yun

There may be some bugs in function data_augmentation_nocrop in image.py? When i set the crop=True, things turn out be right.

PurpleMStone avatar Apr 25 '19 06:04 PurpleMStone