TubeDETR icon indicating copy to clipboard operation
TubeDETR copied to clipboard

Training error in tubedetr.py file.

Open OliverHxh opened this issue 2 years ago • 4 comments

I try to train the network on HC-STVGv2 dataset using the command provided in the README.md file:

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --ema \                                                                                       
  2 --load=pretrained_resnet101_checkpoint.pth --combine_datasets=hcstvg --combine_datasets_val=hcstvg \                                                                  
  3 --v2 --dataset_config config/hcstvg.json --epochs=20 --output-dir=output --batch_size=8

Unfortunately, I encountered this issue in models/tubedetr.py line 180

  File "/root/paddlejob/workspace/STVG/TubeDETR/models/tubedetr.py", line 180, in forward                                                                                 
    tpad_src = tpad_src.view(b * n_clips, f, h, w)                                                                                                                        
RuntimeError: shape '[160, 256, 7, 12]' is invalid for input of size 2817024

. Besides, the durations of the eight samples are: [100, 100, 69, 100, 65, 86, 100, 100].

I think this problem is probably related to the padding approach. Do you have any clue with this BUG and how to fix it? Thank you very much!

OliverHxh avatar May 16 '22 18:05 OliverHxh

All experiments I did were with a batch size of 1 video per GPU given that it already takes quite a bit of GPU memory with long videos / high resolution, so there might be some padding to fix indeed.

antoyang avatar May 20 '22 08:05 antoyang

Hi, I encountered the same issue. Did you fix it?

Glupapa avatar Aug 17 '22 17:08 Glupapa

Hi, I want to increase batch size, too. Did you fix it??

hyundodo avatar Apr 06 '23 15:04 hyundodo

Hi, Was anybody able to solve this issue?

AKASH2907 avatar May 09 '23 05:05 AKASH2907