Open
sdustdk1427
opened this issue 5 years ago
•
5 comments
When i get 000035.weights,then an error occured, i don't know why. I have set the image size in the cfg as 416*416.Pytorch version is 1.0.1.Please help me solve this issue,thank you very much.
2019-05-09 17:08:44 [035] training with 49.642771 samples/s
2019-05-09 17:08:44 save weights to backup/000035.weights
2019-05-09 17:08:44 [036] processed 133992 samples, lr 1.000000e-03
Traceback (most recent call last):
File "train.py", line 375, in
main()
File "train.py", line 156, in main
nsamples = train(epoch)
File "train.py", line 219, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in
return [default_collate(samples) for samples in transposed]
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 209, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 416 and 480 in dimension 2 at /opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/TH/generic/THTensorMoreMath.cpp:1307
@sdustdk1427 same error to #55
I updated dataset.py and train.py. try the code.
Refer to https://discuss.pytorch.org/t/runtimeerror-invalid-argument-0-sizes-of-tensors-must-match-except-in-dimension-0-got-3-and-2-in-dimension-1/23890/15
Today,I use your new dataset.py and train.py,but when I get 000030.weights,I face this problem again!
I refer this https://discuss.pytorch.org/t/runtimeerror-invalid-argument-0-sizes-of-tensors-must-match-except-in-dimension-0-got-3-and-2-in-dimension-1/23890/15,but I can't understand.....sorry......
so what should i do?thank you very very much.
2019-05-10 07:59:33 [030] training with 48.296028 samples/s
2019-05-10 07:59:33 save weights to backup2/000030.weights
2019-05-10 08:01:59 [031] processed 147839 samples, lr 1.000000e-03
Traceback (most recent call last):
File "train.py", line 377, in
main()
File "train.py", line 156, in main
nsamples = train(epoch)
File "train.py", line 221, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/public/home/G19850028/RWJ/pytorch-0.4-yolov3-master/dataset.py", line 14, in custom_collate
data = torch.stack([item[0] for item in batch], 0)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 416 and 512 in dimension 2 at /opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/TH/generic/THTensorMoreMath.cpp:1307
@sdustdk1427 In that case, you can check the information as follows:
in dataset.py, you expand data = torch.stack([item[0] for item in batch],0)
try:
data = torch.stack([item[0] for item in batch],0)
except RuntimeError:
import sys
for item in batch:
print(item[0].getbands())
print(item[0].size())
sys.exit(0)
maybe the image is not identically resized when training mode.
I'd like to ask what the above code does.When I annotate def custom_collate(batch) out, I can run 000050.weight, but I still run into the same problem as before:
258900: Layer(106) nGT 80, nRC 64, nRC75 25, nPP 107, loss: box 2.187, conf 3.256, class 2.181, total 7.624
2019-05-11 13:07:08 [050] training with 29.621098 samples/s
2019-05-11 13:07:08 save weights to backup5/000050.weights
2019-05-11 13:10:04 [051] processed 264078 samples, lr 1.000000e-03
258964: Layer(082) nGT 105, nRC 78, nRC75 31, nPP 114, loss: box 2.332, conf 2.150, class 1.424, total 5.906
258964: Layer(094) nGT 105, nRC 68, nRC75 17, nPP 0, loss: box 2.786, conf 5.809, class 6.787, total 15.382
258964: Layer(106) nGT 105, nRC 80, nRC75 28, nPP 97, loss: box 2.600, conf 4.034, class 3.364, total 9.998
Traceback (most recent call last):
File "train.py", line 377, in
main()
File "train.py", line 156, in main
nsamples = train(epoch)
File "train.py", line 221, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in
return [default_collate(samples) for samples in transposed]
File "/public/home/G19850028/zheng/Anacoda3/public/home/G19850028/anacoda35/envs/pytorch1.0/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 209, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 416 and 448 in dimension 2 at /opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/TH/generic/THTensorMoreMath.cpp:1307
what should i do?
@sdustdk1427 If you comment out "def custom_collate", then default collate_fn is used. Then this phenomenon is exactly same to the first condition (without collate_fn). custom_collate function is used for checking the different size or image types. I don't know exact condition of your environment. I am wondering that your experimental condition is messed or there are some bugs in my code. If you have same problem continuously, I recommend other repo published in github.
Thanks.