yolov2-yolov3_PyTorch
yolov2-yolov3_PyTorch copied to clipboard
mis match erros in training Yolo_v2
env
- GPU 2080ti
- ubuntu 16.04
- python 3.6
- pytorch 1.5.0
errors
when i use yolo-v2 to train the voc dataset, and set the parameters as follows:
parser.add_argument('-v', '--version', default='yolo_v2',
help='yolo_v2, yolo_v3, slim_yolo_v2, tiny_yolo_v3')
parser.add_argument('-d', '--dataset', default='VOC',
help='VOC or COCO dataset')
parser.add_argument('-hr', '--high_resolution', action='store_true', default=True,
help='use high resolution to pretrain.')
parser.add_argument('-ms', '--multi_scale', action='store_true', default=True,
help='use multi-scale trick')
parser.add_argument('--batch_size', default=32, type=int,
help='Batch size for training')
parser.add_argument('--lr', default=1e-3, type=float,
help='initial learning rate')
parser.add_argument('-cos', '--cos', action='store_true', default=False,
help='use cos lr')
parser.add_argument('-no_wp', '--no_warm_up', action='store_true', default=False,
help='yes or no to choose using warmup strategy to train')
parser.add_argument('--wp_epoch', type=int, default=2,
help='The upper bound of warm-up')
parser.add_argument('--dataset_root', default="/home/xxx/tmp/tmp/yolo_v1_v2/data/VOCdevkit",
help='Location of VOC root directory')
parser.add_argument('--num_classes', default=20, type=int,
help='The number of dataset classes')
parser.add_argument('--momentum', default=0.9, type=float,
help='Momentum value for optim')
parser.add_argument('--weight_decay', default=5e-4, type=float,
help='Weight decay for SGD')
parser.add_argument('--gamma', default=0.1, type=float,
help='Gamma update for SGD')
parser.add_argument('--num_workers', default=8, type=int,
help='Number of workers used in dataloading')
parser.add_argument('--cuda', action='store_true', default=True,
help='use cuda.')
parser.add_argument('--save_folder', default='weights/voc/', type=str,
help='Gamma update for SGD')
parser.add_argument('--tfboard', action='store_true', default=False,
help='use tensorboard')
parser.add_argument('--resume', type=str, default=None,
help='fine tune the model trained on MSCOCO.')
An error occured while running the code, it looks like mismatch the size in somewhere, but when i set the multi-scale False, the code is working well. So i think there must be something wrong with multi-scale.
[Epoch 1/250][Iter 0/517][lr 0.000000][Loss: obj 433.45 || cls 8.17 || bbox 19.32 || total 460.94 || size 608 || time: 13.28]
[Epoch 1/250][Iter 10/517][lr 0.000000][Loss: obj 431.94 || cls 7.31 || bbox 13.79 || total 453.04 || size 608 || time: 9.52]
Traceback (most recent call last):
File "/home/xxx/tmp/tmp/yolo_v1_v2/train_voc.py", line 294, in <module>
train()
File "/home/xxx/tmp/tmp/yolo_v1_v2/train_voc.py", line 245, in train
conf_loss, cls_loss, txtytwth_loss, total_loss = model(images, target=targets)
File "/home/xxx/anaconda3/envs/pytorch1.5.0/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/xxx/tmp/tmp/yolo_v1_v2/models/yolo_v2.py", line 217, in forward
x1y1x2y2_pred = (self.decode_boxes(txtytwth_pred) / self.scale_torch).view(-1, 4)
File "/home/xxx/tmp/tmp/yolo_v1_v2/models/yolo_v2.py", line 90, in decode_boxes
xywh_pred = self.decode_xywh(txtytwth_pred)
File "/home/xxx/tmp/tmp/yolo_v1_v2/models/yolo_v2.py", line 74, in decode_xywh
xy_pred = torch.sigmoid(txtytwth_pred[:, :, :, :2]) + self.grid_cell
RuntimeError: The size of tensor a (361) must match the size of tensor b (100) at non-singleton dimension 1
Yes.
I have reported this bug in my README.
I have no idea how to fix this well for now. Maybe I have to rebuild a dataset file to load datas for training.
If you want to use multi-scale training trick, you have to set num_workers as 0 which means that it will cost your more time to train the model——On COCO, I spent about 2 weeks training my YOLOv3.