doctr
doctr copied to clipboard
Recognition Training Pytorch
Bug description
I want to use docTR for training of Recognition of indian language data. I have created a dataset as specified here. The images in the dataset are 400x100x3. And I have also added the vocab in vocabs.py. Can someone explain why I am getting this error and what has to be changed?
Thank you!
Code snippet to reproduce the bug
python references/recognition/train_pytorch.py crnn_vgg16_bn --train_path /home/venkat/doctr-training/synthetic-images/train --val_path /home/venkat/doctr-training/synthetic-images/val --epochs 5 --device 0 --batch_size 4
Error traceback
Namespace(amp=False, arch='crnn_vgg16_bn', batch_size=4, device=0, epochs=5, find_lr=False, font='FreeMono.ttf,FreeSans.ttf,FreeSerif.ttf', input_size=32, lr=0.001, max_chars=12, min_chars=1, name=None, pretrained=False, push_to_hub=False, resume=None, sched='cosine', show_samples=False, test_only=False, train_path='/home/venkat/doctr-training/synthetic-images/train', train_samples=1000, val_path='/home/venkat/doctr-training/synthetic-images/val', val_samples=20, vocab='indian', wb=False, weight_decay=0, workers=None)
Validation set loaded in 0.3732s (50298 samples in 12575 batches)
Train set loaded in 0.3851s (117387 samples in 29346 batches)
IN forward(), input shape is torch.Size([4, 125, 1536])------------------| 0.00% [0/29346 00:00<00:00]
IN check_forward_args, input shape is torch.Size([4, 125, 1536])
IN check_input, shape of the input is torch.Size([4, 125, 1536])
Traceback (most recent call last):
File "references/recognition/train_pytorch.py", line 447, in <module>
main(args)
File "references/recognition/train_pytorch.py", line 367, in main
fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, mb, amp=args.amp)
File "references/recognition/train_pytorch.py", line 117, in fit_one_epoch
train_loss = model(images, targets)['loss']
File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1103, in _call_impl
return forward_call(*input, **kwargs)
File "/home/venkat/doctr-training/doctr/doctr/models/recognition/crnn/pytorch.py", line 208, in forward
logits, _ = self.decoder(features_seq)
File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1103, in _call_impl
return forward_call(*input, **kwargs)
File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 693, in forward self.check_forward_args(input, hx, batch_sizes)
File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 636, in check_forward_args
self.check_input(input, batch_sizes)
File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 209, in check_input
self.input_size, input.size(-1)))
RuntimeError: input.size(-1) must be equal to input_size. Expected 512, got 1536
Environment
Collecting environment information...
DocTR version: 0.6.0a0 TensorFlow version: 2.6.2 PyTorch version: 1.10.1+cu102 (torchvision 0.11.2+cu102) OpenCV version: 4.6.0 OS: Ubuntu 18.04.3 LTS Python version: 3.6.9 Is CUDA available (TensorFlow): No Is CUDA available (PyTorch): Yes CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Nvidia driver version: 470.94 cuDNN version: Could not collect
Deep Learning backend
>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: True
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True
Hello @dishant26 :wave:
Sorry to hear about your trouble! Fortunately, with the information you shared, there are a few leads:
Channel-first vs channel-last
TensorFlow uses a channel-last scheme for tensor processing, while PyTorch uses channel-first.
So when you say
The images in the dataset are 400x100x3
I think you only need to save them using PIL, and the framework will handle this correctly. But if you saved it differently, save them as 3x400x100
instead :+1:
Architecture sizing
The model is created and sized to match constraints specified in its constructor. So the training script needs to be passed the vocab to use to correctly size the model. Could you try to instantiate the model outside of the training script and see if you encounter the same problem please? :pray: On my end, training with a different vocab does size the model correctly
Cheers :v:
Worked for me. Thanks!