TubeDETR
TubeDETR copied to clipboard
Training error in tubedetr.py file.
I try to train the network on HC-STVGv2 dataset using the command provided in the README.md file:
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --ema \
2 --load=pretrained_resnet101_checkpoint.pth --combine_datasets=hcstvg --combine_datasets_val=hcstvg \
3 --v2 --dataset_config config/hcstvg.json --epochs=20 --output-dir=output --batch_size=8
Unfortunately, I encountered this issue in models/tubedetr.py line 180
File "/root/paddlejob/workspace/STVG/TubeDETR/models/tubedetr.py", line 180, in forward
tpad_src = tpad_src.view(b * n_clips, f, h, w)
RuntimeError: shape '[160, 256, 7, 12]' is invalid for input of size 2817024
. Besides, the durations of the eight samples are: [100, 100, 69, 100, 65, 86, 100, 100]
.
I think this problem is probably related to the padding approach. Do you have any clue with this BUG and how to fix it? Thank you very much!
All experiments I did were with a batch size of 1 video per GPU given that it already takes quite a bit of GPU memory with long videos / high resolution, so there might be some padding to fix indeed.
Hi, I encountered the same issue. Did you fix it?
Hi, I want to increase batch size, too. Did you fix it??
Hi, Was anybody able to solve this issue?