TextRecognitionDataGenerator
TextRecognitionDataGenerator copied to clipboard
hindi text render incorrect
When I try to use this repo to generate hindi text imgs, it works well. But actually all characters are incorrect because PIL.trutype could only render a simple character-to-glyph but hindi text is complex.
here is the same problem: https://github.com/python-pillow/Pillow/issues/3191
So it's pretty much like arabic where ligatures were not used properly.
I'll take a look at that Pillow issue, they seem to have linked to a few library that address the issue.
Thank you for reporting this.
@Belval @KosukeHao how can I generate Hindi images. It is mentioned that language that we can use should be in French, English, Spanish, German or Chinese
There is no solution as of today. Unfortunately, the ligatures are still not supported and needed dependencies are unknown.
I would be very interested if you can make a PR that has a working sample. I would love to work on it myself, but I do not know Hindi and cannot see the difference between good and bad samples. If you can provide clear examples of what TRDG should output given a specific input I could give it another try.
Installing libraqm
as suggested here should help in most cases:
https://stackoverflow.com/questions/39630916/
Edit:
Sometimes installing libraqm
causes the following error:
OSError: invalid face handle
Sometimes it just works well. Not sure what could be the cause.
Is this issue solved? I would like to use the library to generate in Indian Punjabi (Gurmukhi) language which is similar to Hindi (Devnagari). Please let me know if you need some help in Punjab language.
as you can see it is adding spaces between characters, actual text is ਚੋਭਾ ਸਾਥਣ ਸਨੋਲੀ ਸ਼ੋਨਫੋਲ ਦਰਦਰਾ
I used these parameters to generate -l pb -c 10 -w 5 -f 64 -dt dicts\pb.txt
This might help: https://github.com/Belval/TextRecognitionDataGenerator/pull/164#issuecomment-732970029
Thanks Balval & GokuINC, enable the --word_split & libraqm seems to solve the problem
Label: ਉੱਪੁਰ ਟਾਇਰੀ leaves ਤੱਕ ਮੁਫ਼ਤ_6