parseq Sometimes predict long, redundancy and repetitive chars, like 'parseqqqqqqqqqqqq'(ground truth is 'parseq')

Sometimes predict long, redundancy and repetitive chars, like 'parseqqqqqqqqqqqq'(ground truth is 'parseq')

Open WenjunLiu6146 opened this issue 2 years ago • 5 comments

Does any chance you know the reason? Thank you for your talented work!

Dec 08 '22 07:12 WenjunLiu6146

Hello, please provide more details such as the model and weights used as well as the exact image you're using.

Dec 08 '22 08:12 baudm

Hello, please provide more details such as the model and weights used as well as the exact image you're using.

I used the parseq trained on Chinese dataset, which contains about 6K chars, and used it for test dataset inference.

Dec 09 '22 01:12 WenjunLiu6146

Sorry but I can't help you since:

I have no access to and am not familiar with the specific model you're referring to.
I have no access to and am not familiar with the data you're using.
PARSeq was developed and tested with Latin characters on primarily English text. I am not familiar with the intricacies of Chinese text.

You might want to try increasing the number of decoder layers, or using a larger version of the model since the Chinese charset is much bigger than the Latin one.

Dec 09 '22 11:12 baudm

So I tinkered a lot and this is perhaps due to 'label_length' in main.yaml and the image size you are giving to the model. In my case, with the model on hugging face; if you input an image with a single word parseq followed by white-spaces that are equivalent to 30-35 characters in total, the result is correct.

However, if we exceed this and input an image with length of lets say beyond 40 characters, redundant repetition of characters is seen.

gave me gatery.comminFreedom

gave me gateway.................

gave me gateway.

It is probably due to the face that model was trained on 1-word images and will hallucinate for longer labels. I trained a model with label-length set to 65 and it was able to overcome this problem.

Jul 06 '23 07:07 ceyxasm

Thanks. I'll try your sollution.

Jul 07 '23 12:07 WenjunLiu6146

parseq parseq copied to clipboard

Sometimes predict long, redundancy and repetitive chars, like 'parseqqqqqqqqqqqq'(ground truth is 'parseq')

parseq
parseq copied to clipboard