parseq
parseq copied to clipboard
NAR decoder error
@baudm hi, thanks for your great work.
I have trained the parseq model with my own dataset, the max length is 89, and the chardict is 95full.yaml. when i test the textline with nar decoder, The result looks weird. there will be a lot of repeated characters. As shown below:
image:
GT:11 SORSOGON ST LEVITOWN, CITY OF PARANAQUE, NCR
PRE: 11 SORSOGON ST LEVITOWN, CITY OF PARANAQUE,,NNRR,,,ITTTOOOWNNN,, CCCIITTTYY O OFFFF R VVV
but when i use the ar decoder, there is no repeated characters. is there any solutions to alleviate the problem? thanks
Off the top of my head, I could only hypothesize that the repeated characters are caused by NAR decoding failing to recognize the end of sequence. In short, [E]
is not being decoded properly that's why you're seeing more characters than expected.
The current configuration (128x32 px images with 8x4 px patch size) is not expected to perform well for such a wide and short (in terms of height) text instance.
One thing you could try is to modify AR decoding to decode more than 1 character at a time. This is easily doable with the current codebase I think. You could increase the max length to 90, then decode 10 characters at once to minimize AR iterations.
I also notice that you're trying to decode whitespace characters. That could also be a potential cause of the issue with NAR decoding. Another thing you could try is increasing the number of refinement iterations + ANDing the cloze mask with a mask which excludes low-confidence characters.
I also notice that you're trying to decode whitespace characters. That could also be a potential cause of the issue with NAR decoding. Another thing you could try is increasing the number of refinement iterations + ANDing the cloze mask with a mask which excludes low-confidence characters.
thank you very much, I will try the suggestion and feedback.