SwinTextSpotter
SwinTextSpotter copied to clipboard
Tensor shapes conflict when training on VinText
I tried to train the model on VinText dataset and got this traceback after several iterations:
Traceback (most recent call last):
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/detectron2/engine/train_loop.py", line 140, in train
self.run_step()
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/detectron2/engine/defaults.py", line 441, in run_step
self._trainer.run_step()
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/detectron2/engine/train_loop.py", line 234, in run_step
loss_dict = self.model(data)
File "/home/ccbien/miniconda3/envs/scene_text/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/swints.py", line 184, in forward
loss_dict = self.criterion(output, targets, self.mask_encoding)
File "/home/ccbien/miniconda3/envs/scene_text/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 153, in forward
losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, mask_encoding))
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 135, in get_loss
return loss_map[loss](outputs, targets, indices, num_boxes, mask_encoding, **kwargs)
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 75, in loss_boxes
raise e
File "/home/ccbien/projects/SceneText/exp/SwinTextSpotter/projects/SWINTS/swints/loss.py", line 69, in loss_boxes
src_boxes_ = src_boxes / image_size
RuntimeError: The size of tensor a (300) must match the size of tensor b (377) at non-singleton dimension 0
Config:
_BASE_: "Base-SWINTS_swin.yaml"
MODEL:
SWINTS:
NUM_PROPOSALS: 300
NUM_CLASSES: 2
REC_HEAD:
BATCH_SIZE: 1
DATASETS:
TRAIN: ("vintext_train", "vintext_val")
TEST: ("vintext_test",)
SOLVER:
IMS_PER_BATCH: 1
STEPS: (360000,420000)
MAX_ITER: 100000
CHECKPOINT_PERIOD: 10000
INPUT:
FORMAT: "RGB"
The training progress was going well until reaching the bad sample:
src_boxes.shape = torch.Size([300, 4])
image_size.shape = torch.Size([377, 4])
Here src_boxes.shape
is consistent with NUM_PROPOSALS
in the config, so I guess there are some issues of the ground truth annotations (downloaded originally from README.MD
).
You are rights. Limiting the number of ground truth is required. You can try to fix the code here. I will update it later.