doctr icon indicating copy to clipboard operation
doctr copied to clipboard

Recognition Training Pytorch

Open dishant26 opened this issue 2 years ago • 1 comments

Bug description

I want to use docTR for training of Recognition of indian language data. I have created a dataset as specified here. The images in the dataset are 400x100x3. And I have also added the vocab in vocabs.py. Can someone explain why I am getting this error and what has to be changed?

Thank you!

Code snippet to reproduce the bug

python references/recognition/train_pytorch.py crnn_vgg16_bn --train_path /home/venkat/doctr-training/synthetic-images/train --val_path /home/venkat/doctr-training/synthetic-images/val --epochs 5 --device 0 --batch_size 4

Error traceback

Namespace(amp=False, arch='crnn_vgg16_bn', batch_size=4, device=0, epochs=5, find_lr=False, font='FreeMono.ttf,FreeSans.ttf,FreeSerif.ttf', input_size=32, lr=0.001, max_chars=12, min_chars=1, name=None, pretrained=False, push_to_hub=False, resume=None, sched='cosine', show_samples=False, test_only=False, train_path='/home/venkat/doctr-training/synthetic-images/train', train_samples=1000, val_path='/home/venkat/doctr-training/synthetic-images/val', val_samples=20, vocab='indian', wb=False, weight_decay=0, workers=None)
Validation set loaded in 0.3732s (50298 samples in 12575 batches)
Train set loaded in 0.3851s (117387 samples in 29346 batches)
IN forward(), input shape is torch.Size([4, 125, 1536])------------------| 0.00% [0/29346 00:00<00:00]
IN check_forward_args, input shape is torch.Size([4, 125, 1536])
IN check_input, shape of the input is torch.Size([4, 125, 1536])
Traceback (most recent call last):
  File "references/recognition/train_pytorch.py", line 447, in <module>
    main(args)
  File "references/recognition/train_pytorch.py", line 367, in main
    fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, mb, amp=args.amp)
  File "references/recognition/train_pytorch.py", line 117, in fit_one_epoch
    train_loss = model(images, targets)['loss']
  File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1103, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkat/doctr-training/doctr/doctr/models/recognition/crnn/pytorch.py", line 208, in forward
    logits, _ = self.decoder(features_seq)
  File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1103, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 693, in forward    self.check_forward_args(input, hx, batch_sizes)
  File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 636, in check_forward_args
    self.check_input(input, batch_sizes)
  File "/home/venkat/doctr-training/doctr_env/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 209, in check_input
    self.input_size, input.size(-1)))
RuntimeError: input.size(-1) must be equal to input_size. Expected 512, got 1536

Environment

Collecting environment information...

DocTR version: 0.6.0a0 TensorFlow version: 2.6.2 PyTorch version: 1.10.1+cu102 (torchvision 0.11.2+cu102) OpenCV version: 4.6.0 OS: Ubuntu 18.04.3 LTS Python version: 3.6.9 Is CUDA available (TensorFlow): No Is CUDA available (PyTorch): Yes CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Nvidia driver version: 470.94 cuDNN version: Could not collect

Deep Learning backend

>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: True
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True

dishant26 avatar Jul 19 '22 05:07 dishant26

Hello @dishant26 :wave:

Sorry to hear about your trouble! Fortunately, with the information you shared, there are a few leads:

Channel-first vs channel-last

TensorFlow uses a channel-last scheme for tensor processing, while PyTorch uses channel-first.

So when you say

The images in the dataset are 400x100x3

I think you only need to save them using PIL, and the framework will handle this correctly. But if you saved it differently, save them as 3x400x100 instead :+1:

Architecture sizing

The model is created and sized to match constraints specified in its constructor. So the training script needs to be passed the vocab to use to correctly size the model. Could you try to instantiate the model outside of the training script and see if you encounter the same problem please? :pray: On my end, training with a different vocab does size the model correctly

Cheers :v:

frgfm avatar Jul 20 '22 09:07 frgfm

Worked for me. Thanks!

dishant26 avatar Aug 22 '22 13:08 dishant26