SynthText_Chinese_version icon indicating copy to clipboard operation
SynthText_Chinese_version copied to clipboard

For help how to encode and decode another languages which are different from Chinese and English such as Arabic!

Open yingning opened this issue 7 years ago • 1 comments

Hello ,friends .Thank you for your good job .Can you share me more experience for encode and decode Chinese or another languages like Arabic, @JarveeLee

yingning avatar Nov 12 '17 08:11 yingning

@yingning I was trying to make from my own modification too (based on ankush-me/SynthText). Here are some tips:

  • Prepair your own fonts, and put it in data/font/, adding them in the fontlist.txt
  • Modify the data/models/font_px2pt.cp. This is a pickle file of a dict object. You know what key to add once you got a error about key not found.
  • text_utils.py, line 518. This should be the __init__ of TextSource class. You should change with open(fn, 'r') as f: to with open(fc, 'r', encoding='utf8'). Maybe it won't be 'utf8' for Arabic, I don't know.
  • Add your own text in data/newsgroup/newsgroup.txt. It is lines of text where words are separated by space. There is a little problem with Chinese for that words are not separated by space in Chinsese. I don't know how Arabic works, but be careful with it.
  • text_utils.py, line 130, there should be something like line_bounds = font.get_rect.... I got an error glyph not found for id 3 here. The reason seems to be that additional space is add round the text in lines. So I add line = lines[np.argmax(lengths)].strip() to fix it. But I still don't understand why space matters. Maybe something is wrong with the fonts file.

That's all I do to make it work for Chinese. Just run it over and over again along with bug fixing.

zhengwx11 avatar Nov 21 '17 04:11 zhengwx11