SynthText_Chinese_version
SynthText_Chinese_version copied to clipboard
For help how to encode and decode another languages which are different from Chinese and English such as Arabic!
Hello ,friends .Thank you for your good job .Can you share me more experience for encode and decode Chinese or another languages like Arabic, @JarveeLee
@yingning I was trying to make from my own modification too (based on ankush-me/SynthText). Here are some tips:
- Prepair your own fonts, and put it in
data/font/
, adding them in thefontlist.txt
- Modify the
data/models/font_px2pt.cp
. This is a pickle file of adict
object. You know what key to add once you got a error aboutkey not found
. -
text_utils.py
, line 518. This should be the__init__
ofTextSource
class. You should changewith open(fn, 'r') as f:
towith open(fc, 'r', encoding='utf8')
. Maybe it won't be 'utf8' for Arabic, I don't know. - Add your own text in
data/newsgroup/newsgroup.txt
. It is lines of text where words are separated by space. There is a little problem with Chinese for that words are not separated by space in Chinsese. I don't know how Arabic works, but be careful with it. -
text_utils.py
, line 130, there should be something likeline_bounds = font.get_rect...
. I got an errorglyph not found for id 3
here. The reason seems to be that additional space is add round the text inlines
. So I addline = lines[np.argmax(lengths)].strip()
to fix it. But I still don't understand why space matters. Maybe something is wrong with the fonts file.
That's all I do to make it work for Chinese. Just run it over and over again along with bug fixing.