TubeDETR Training error in tubedetr.py file.

Training error in tubedetr.py file.

Open OliverHxh opened this issue 2 years ago • 4 comments

I try to train the network on HC-STVGv2 dataset using the command provided in the README.md file:

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --ema \                                                                                       
  2 --load=pretrained_resnet101_checkpoint.pth --combine_datasets=hcstvg --combine_datasets_val=hcstvg \                                                                  
  3 --v2 --dataset_config config/hcstvg.json --epochs=20 --output-dir=output --batch_size=8

Unfortunately, I encountered this issue in models/tubedetr.py line 180

  File "/root/paddlejob/workspace/STVG/TubeDETR/models/tubedetr.py", line 180, in forward                                                                                 
    tpad_src = tpad_src.view(b * n_clips, f, h, w)                                                                                                                        
RuntimeError: shape '[160, 256, 7, 12]' is invalid for input of size 2817024

. Besides, the durations of the eight samples are: [100, 100, 69, 100, 65, 86, 100, 100].

I think this problem is probably related to the padding approach. Do you have any clue with this BUG and how to fix it? Thank you very much!

May 16 '22 18:05 OliverHxh

All experiments I did were with a batch size of 1 video per GPU given that it already takes quite a bit of GPU memory with long videos / high resolution, so there might be some padding to fix indeed.

May 20 '22 08:05 antoyang

Hi, I encountered the same issue. Did you fix it?

Aug 17 '22 17:08 Glupapa

Hi, I want to increase batch size, too. Did you fix it??

Apr 06 '23 15:04 hyundodo

Hi, Was anybody able to solve this issue?

May 09 '23 05:05 AKASH2907

TubeDETR TubeDETR copied to clipboard

Training error in tubedetr.py file.

TubeDETR
TubeDETR copied to clipboard