Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

TrOCR Printed Text have Extra Results

Open abdksyed opened this issue 3 years ago • 7 comments

Hi,

I was trying to do inference on cropped text regions which are printed, using the trocr-base-printed model.

It's doing well, but on certain images it's giving some extra results which is weird.

Example 1

image Results: ACCOUNT ON D ON D The images is Account, which it predicted, but it's also adding ON D ON D

Example 2

image Result: 29/11/1936 / 2/ 2/ Similarly, the images is 29/11/1936, but also predicted extra / 2/ 2/

Example 3

image Result: PREVIOUS ONCLUSE SIGNE SUBE SOLDE SOLD The image is Previous, but the model is predicting many things along with Previos.

abdksyed avatar Nov 22 '21 06:11 abdksyed

Hi,

Thanks for reporting. I'm suspecting this may have to do with the decoding settings, rather than the model. In HuggingFace, one uses the generate method which is defined for all generative models. By default, it uses greedy decoding, but it can also perform beam search or top-k sampling decoding.

Can you try adjusting these settings, and report back?

Thanks!

NielsRogge avatar Nov 22 '21 09:11 NielsRogge

Thanks, Will look into top-k predictions, will run the inference with that and will get back here with results.

abdksyed avatar Nov 25 '21 06:11 abdksyed

Hi,

After investigation, it turns out the generate() method currently does not take into account config.decoder.eos_token_id, only config.eos_token_id.

You can fix it by setting model.config.eos_token_id = 2.

We will fix this soon.

NielsRogge avatar Nov 29 '21 17:11 NielsRogge

Hi thanks for all your efforts, I have one use case. Can we read texts from the images like shown in the image below, using TrOCR. I have tried with inference used in space by changing checkpoint to trocr-base-printed . I got weird results. Do I need to fine_tune the model by creating new dataset by showing actual target values then Should I use that model for inference. Sorry I am new to this space. Can you please guide me what to do to predict the actual text in the image. some times many other texts present in the image, how to get the only required text, instead of many other non essential texts. WhatsApp Image 2021-11-30 at 11 19 37

mldurga avatar Dec 11 '21 07:12 mldurga

Hi,

Yes that's possible, take a look at this Space: https://huggingface.co/spaces/vishnun/CRAFT-OCR.

It combines text detection (CRAFT) with text recognition (TrOCR).

NielsRogge avatar Dec 11 '21 09:12 NielsRogge

Hello, I have the same problem with handwritten text using google/vit-base-patch16-224-in21k as encoder and indobenchmark/indobert-base-p1 as decoder

All the result didn't predict well and give some extra results too even with train dataset, Example: e9pe3NgAKq1641542771-page0-1 Result: dan dalam sebagai sebagai sebagai. atau / akan / mana berjalan termasuk / mana awal misalnya atau atau karena ingat atau ". atau kemudian. tanpa tetaplah. atau atau atas harus ) adalah perlu jugaika lagi. * khusus lagi tanpa termasuk lokal dalam perlu perlu bantuan lagi dapat untuk luar termasuk atau atassemwiika hutan kita

I following fine tune seq2seq tutorial, but when I try fine tune with native pytorch It works well

dhea1323 avatar Apr 27 '22 03:04 dhea1323

Hello, I have the same problem with handwritten text using google/vit-base-patch16-224-in21k as encoder and indobenchmark/indobert-base-p1 as decoder

All the result didn't predict well and give some extra results too even with train dataset, Example: e9pe3NgAKq1641542771-page0-1 Result: dan dalam sebagai sebagai sebagai. atau / akan / mana berjalan termasuk / mana awal misalnya atau atau karena ingat atau ". atau kemudian. tanpa tetaplah. atau atau atas harus ) adalah perlu jugaika lagi. * khusus lagi tanpa termasuk lokal dalam perlu perlu bantuan lagi dapat untuk luar termasuk atau atassemwiika hutan kita

I following fine tune seq2seq tutorial, but when I try fine tune with native pytorch It works well

@dhea1323 Hi, I'm currently trying to train TrOCR on a German custom Data-Set. I am not sure what the correct settings are for Processor and Model. If I train to another language, can I do a warm start at all, or is that not possible because I am using a different tokenizer? I would be very happy if you could show me your model and processor initialization. And if there are other important things you could tell me. Thanks a lot!

jonas-da avatar Jul 09 '22 14:07 jonas-da

@abdksyed @NielsRogge were you able to solve this issue? Tried setting eos token as @NielsRogge said, but dint work out, any possible solutions?

suhas004 avatar Mar 23 '23 19:03 suhas004

Nope, the last time I checked, it didn't work. But I don't know after almost 18 months if there is any change or not.

abdksyed avatar Mar 23 '23 20:03 abdksyed