SwinTextSpotter icon indicating copy to clipboard operation
SwinTextSpotter copied to clipboard

Tensor shapes conflict when training on VinText

Open ccbien opened this issue 2 years ago • 1 comments

I tried to train the model on VinText dataset and got this traceback after several iterations:

Traceback (most recent call last):
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/detectron2/engine/train_loop.py", line 140, in train
    self.run_step()
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/detectron2/engine/defaults.py", line 441, in run_step
    self._trainer.run_step()
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/detectron2/engine/train_loop.py", line 234, in run_step
    loss_dict = self.model(data)
  File "/home/ccbien/miniconda3/envs/scene_text/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/swints.py", line 184, in forward
    loss_dict = self.criterion(output, targets, self.mask_encoding)
  File "/home/ccbien/miniconda3/envs/scene_text/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 153, in forward
    losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, mask_encoding))
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 135, in get_loss
    return loss_map[loss](outputs, targets, indices, num_boxes, mask_encoding, **kwargs)
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 75, in loss_boxes
    raise e
  File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 69, in loss_boxes
    src_boxes_ = src_boxes / image_size
RuntimeError: The size of tensor a (300) must match the size of tensor b (377) at non-singleton dimension 0

Config:

_BASE_: "Base-SWINTS_swin.yaml"
MODEL:
  SWINTS:
    NUM_PROPOSALS: 300
    NUM_CLASSES: 2
  REC_HEAD:
    BATCH_SIZE: 1
DATASETS:
  TRAIN: ("vintext_train", "vintext_val")
  TEST:  ("vintext_test",)
SOLVER:
  IMS_PER_BATCH: 1
  STEPS: (360000,420000)
  MAX_ITER: 100000
  CHECKPOINT_PERIOD: 10000
INPUT:
  FORMAT: "RGB"

The training progress was going well until reaching the bad sample:

src_boxes.shape = torch.Size([300, 4])
image_size.shape = torch.Size([377, 4])

Here src_boxes.shape is consistent with NUM_PROPOSALS in the config, so I guess there are some issues of the ground truth annotations (downloaded originally from README.MD).

ccbien avatar Sep 14 '22 10:09 ccbien

You are rights. Limiting the number of ground truth is required. You can try to fix the code here. I will update it later.

mxin262 avatar Sep 17 '22 07:09 mxin262