yolov2-yolov3_PyTorch icon indicating copy to clipboard operation
yolov2-yolov3_PyTorch copied to clipboard

mis match erros in training Yolo_v2

Open bruce1408 opened this issue 5 years ago • 1 comments

env

  • GPU 2080ti
  • ubuntu 16.04
  • python 3.6
  • pytorch 1.5.0

errors

when i use yolo-v2 to train the voc dataset, and set the parameters as follows:

parser.add_argument('-v', '--version', default='yolo_v2',
                        help='yolo_v2, yolo_v3, slim_yolo_v2, tiny_yolo_v3')

    parser.add_argument('-d', '--dataset', default='VOC',
                        help='VOC or COCO dataset')

    parser.add_argument('-hr', '--high_resolution', action='store_true', default=True,
                        help='use high resolution to pretrain.')

    parser.add_argument('-ms', '--multi_scale', action='store_true', default=True,
                        help='use multi-scale trick')

    parser.add_argument('--batch_size', default=32, type=int,
                        help='Batch size for training')

    parser.add_argument('--lr', default=1e-3, type=float,
                        help='initial learning rate')

    parser.add_argument('-cos', '--cos', action='store_true', default=False,
                        help='use cos lr')

    parser.add_argument('-no_wp', '--no_warm_up', action='store_true', default=False,
                        help='yes or no to choose using warmup strategy to train')

    parser.add_argument('--wp_epoch', type=int, default=2,
                        help='The upper bound of warm-up')

    parser.add_argument('--dataset_root', default="/home/xxx/tmp/tmp/yolo_v1_v2/data/VOCdevkit",
                        help='Location of VOC root directory')

    parser.add_argument('--num_classes', default=20, type=int,
                        help='The number of dataset classes')

    parser.add_argument('--momentum', default=0.9, type=float,
                        help='Momentum value for optim')

    parser.add_argument('--weight_decay', default=5e-4, type=float,
                        help='Weight decay for SGD')

    parser.add_argument('--gamma', default=0.1, type=float,
                        help='Gamma update for SGD')

    parser.add_argument('--num_workers', default=8, type=int,
                        help='Number of workers used in dataloading')

    parser.add_argument('--cuda', action='store_true', default=True,
                        help='use cuda.')

    parser.add_argument('--save_folder', default='weights/voc/', type=str,
                        help='Gamma update for SGD')

    parser.add_argument('--tfboard', action='store_true', default=False,
                        help='use tensorboard')

    parser.add_argument('--resume', type=str, default=None,
                        help='fine tune the model trained on MSCOCO.')

An error occured while running the code, it looks like mismatch the size in somewhere, but when i set the multi-scale False, the code is working well. So i think there must be something wrong with multi-scale.

[Epoch 1/250][Iter 0/517][lr 0.000000][Loss: obj 433.45 || cls 8.17 || bbox 19.32 || total 460.94 || size 608 || time: 13.28]
[Epoch 1/250][Iter 10/517][lr 0.000000][Loss: obj 431.94 || cls 7.31 || bbox 13.79 || total 453.04 || size 608 || time: 9.52]
Traceback (most recent call last):
  File "/home/xxx/tmp/tmp/yolo_v1_v2/train_voc.py", line 294, in <module>
    train()
  File "/home/xxx/tmp/tmp/yolo_v1_v2/train_voc.py", line 245, in train
    conf_loss, cls_loss, txtytwth_loss, total_loss = model(images, target=targets)
  File "/home/xxx/anaconda3/envs/pytorch1.5.0/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/tmp/tmp/yolo_v1_v2/models/yolo_v2.py", line 217, in forward
    x1y1x2y2_pred = (self.decode_boxes(txtytwth_pred) / self.scale_torch).view(-1, 4)
  File "/home/xxx/tmp/tmp/yolo_v1_v2/models/yolo_v2.py", line 90, in decode_boxes
    xywh_pred = self.decode_xywh(txtytwth_pred)
  File "/home/xxx/tmp/tmp/yolo_v1_v2/models/yolo_v2.py", line 74, in decode_xywh
    xy_pred = torch.sigmoid(txtytwth_pred[:, :, :, :2]) + self.grid_cell
RuntimeError: The size of tensor a (361) must match the size of tensor b (100) at non-singleton dimension 1

bruce1408 avatar Sep 05 '20 12:09 bruce1408

Yes.

I have reported this bug in my README.

I have no idea how to fix this well for now. Maybe I have to rebuild a dataset file to load datas for training.

If you want to use multi-scale training trick, you have to set num_workers as 0 which means that it will cost your more time to train the model——On COCO, I spent about 2 weeks training my YOLOv3.

yjh0410 avatar Sep 08 '20 07:09 yjh0410