pytorch-ssd icon indicating copy to clipboard operation
pytorch-ssd copied to clipboard

Retraining RuntimeError: expect torch.LongTensor but found torch.IntTensor

Open hyl-g opened this issue 5 years ago • 1 comments

Hi,

I followed procedure for retraining the mb1-ssd for selected open_images. I got the following error:

RuntimeError: Expected object of type torch.LongTensor but found type torch.IntTensor for argument #2 'target'.

Any suggestion to fix this problem is welcome.

Here is a capture of the command and the log messages:

(base) D:\pytorch-ssd>python train_ssd.py --num_workers 0 --dataset_type open_images --datasets data/open_images --net mb1-ssd --pretrained_ssd models/mobilenet-v1-ssd-mp-0_675.pth --scheduler cosine --lr 0.01 --t_max 100 --validation_epochs 5 --num_epochs 100 --base_net_lr 0.001 --batch_size 5 2019-08-16 09:22:17,702 - root - INFO - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=5, checkpoint_folder='models/', dataset_type='open_images', datasets=['data/open_images'], debug_steps=100, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', omentum=0.9, net='mb1-ssd', num_epochs=100, num_workers=0, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100.0, use_cuda=True, validation_dataset=None, validation_epochs=5, weight_decay=0.0005) 2019-08-16 09:22:17,702 - root - INFO - Prepare training datasets. 2019-08-16 09:22:20,120 - root - INFO - Dataset Summary:Number of Images: 2444 Minimum Number of Images for a Class: -1 Label Distribution: Bear: 427 Deer: 3867 2019-08-16 09:22:20,120 - root - INFO - Stored labels into file models/open-images-model-labels.txt.

2019-08-16 09:22:20,120 - root - INFO - Train dataset size: 2444 2019-08-16 09:22:20,120 - root - INFO - Prepare Validation datasets. 2019-08-16 09:22:20,479 - root - INFO - Dataset Summary:Number of Images: 341 Minimum Number of Images for a Class: -1 Label Distribution: Bear: 73 Deer: 449 2019-08-16 09:22:20,479 - root - INFO - validation dataset size: 341 2019-08-16 09:22:20,494 - root - INFO - Build network. 2019-08-16 09:22:20,588 - root - INFO - Init from pretrained ssd models/mobilenet-v1-ssd-mp-0_675.pth 2019-08-16 09:22:20,635 - root - INFO - Took 0.05 seconds to load the model. 2019-08-16 09:22:20,635 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.001, Extra La yers learning rate: 0.01. 2019-08-16 09:22:20,635 - root - INFO - Uses CosineAnnealingLR scheduler. 2019-08-16 09:22:20,635 - root - INFO - Start training from epoch 0. d:\miniconda3\lib\site-packages\torch\nn\functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) Traceback (most recent call last): File "train_ssd.py", line 319, in device=DEVICE, debug_steps=args.debug_steps, epoch=epoch) File "train_ssd.py", line 123, in train regression_loss, classification_loss = criterion(confidence, locations, labels, boxes) # TODO C HANGE BOXES File "d:\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "D:\pytorch-ssd\vision\nn\multibox_loss.py", line 41, in forward classification_loss = F.cross_entropy(confidence.reshape(-1, num_classes), labels[mask], size_av erage=False) File "d:\miniconda3\lib\site-packages\torch\nn\functional.py", line 1550, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "d:\miniconda3\lib\site-packages\torch\nn\functional.py", line 1407, in nll_loss return torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index ) RuntimeError: Expected object of type torch.LongTensor but found type torch.IntTensor for argument # 2 'target'

hyl-g avatar Aug 16 '19 17:08 hyl-g

The problem was solved by casting labels in functions train() and test() to torch.LongTensor. See the diff below:

diff_train_ssd.txt

hyl-g avatar Sep 12 '19 15:09 hyl-g