deep-text-recognition-benchmark icon indicating copy to clipboard operation
deep-text-recognition-benchmark copied to clipboard

Training on Chinese data has problems with long text support

Open Single430 opened this issue 3 years ago • 5 comments

Below is the configuration file and training log. Looking forward to your reply.

IMG_20210420_103511

Ocr2 @ku21fan

Single430 avatar Apr 20 '21 02:04 Single430

In theory, if the model can work well on short text recognition, it also recognizes text at good accuracy by separating your sequence. I think you should use window sliding technique to solve this problem

nguyenviettuan96 avatar May 06 '21 06:05 nguyenviettuan96

By the nature of CRNN, you can inference any length of the sentence as you wish. However, in the training stage the sample is resized to a fixed size, so if your training sample is too long, the resized image is blurred and cannot be distinguished, and the training accuracy can not be increased. So, please choose a reasonable length of your long text :)

zhtmike avatar May 06 '21 12:05 zhtmike

@nguyenviettuan96 @zhtmike Thank you very much for your reply, but I don’t think this is the best way to solve the problem, I will continue to try other things.

Single430 avatar Jul 26 '21 09:07 Single430

@nguyenviettuan96 @zhtmike Thank you very much for your reply, but I don’t think this is the best way to solve the problem, I will continue to try other things.

hello Have you found a better solution?

Zhou2019 avatar May 20 '22 04:05 Zhou2019

in ctc loss if the output of the lstm(sequence length) lower than the label(grand truth) loss is infinity ... .for example the sequence length is 25 and your text is 50 then the loss is infinity and pytorch set it to zero(if zero infinity set to true) then the model cant learn the long text for solve the problem have 2 solution:

  1. attention dont have this problem I train it with text of long 150 and have good result (change the input size to 46*320)

  2. change the output shape of feature extraction ....default it is between 24 to 26 change the input size to 32*320 then feature map size is between 80 t0 84 if you need longer text add up sample at the end of model..

see pytorch ctc loss: https://pytorch.org/docs/stable/generated/torch.nn.CTCLoss.html

raminrahimi6970 avatar Jun 02 '22 06:06 raminrahimi6970