SynthText_Chinese_version For help how to encode and decode another languages which are different from Chinese and English such as Arabic！

For help how to encode and decode another languages which are different from Chinese and English such as Arabic！

Open yingning opened this issue 7 years ago • 1 comments

Hello ,friends .Thank you for your good job .Can you share me more experience for encode and decode Chinese or another languages like Arabic, @JarveeLee

Nov 12 '17 08:11 yingning

@yingning I was trying to make from my own modification too (based on ankush-me/SynthText). Here are some tips:

Prepair your own fonts, and put it in data/font/, adding them in the fontlist.txt
Modify the data/models/font_px2pt.cp. This is a pickle file of a dict object. You know what key to add once you got a error about key not found.
text_utils.py, line 518. This should be the __init__ of TextSource class. You should change with open(fn, 'r') as f: to with open(fc, 'r', encoding='utf8'). Maybe it won't be 'utf8' for Arabic, I don't know.
Add your own text in data/newsgroup/newsgroup.txt. It is lines of text where words are separated by space. There is a little problem with Chinese for that words are not separated by space in Chinsese. I don't know how Arabic works, but be careful with it.
text_utils.py, line 130, there should be something like line_bounds = font.get_rect.... I got an error glyph not found for id 3 here. The reason seems to be that additional space is add round the text in lines. So I add line = lines[np.argmax(lengths)].strip() to fix it. But I still don't understand why space matters. Maybe something is wrong with the fonts file.

That's all I do to make it work for Chinese. Just run it over and over again along with bug fixing.

Nov 21 '17 04:11 zhengwx11

SynthText_Chinese_version SynthText_Chinese_version copied to clipboard

For help how to encode and decode another languages which are different from Chinese and English such as Arabic！

SynthText_Chinese_version
SynthText_Chinese_version copied to clipboard