char-rnn-tensorflow icon indicating copy to clipboard operation
char-rnn-tensorflow copied to clipboard

Does not word with other language

Open ShuvenduRoy opened this issue 7 years ago • 5 comments

Where I have to change to support UTF-8. so that I can train it on other languages

ShuvenduRoy avatar Nov 05 '17 17:11 ShuvenduRoy

It actually should work with utf-8 if you're using the latest version.

What are your versions:

  • char-rnn-tensorflow
  • tensorflow
  • python

Thanks.

john-parton avatar Nov 22 '17 17:11 john-parton

Actually the sample outputs my Greek text as raw utf-8 , " \xcf\xce\x83, \xb1\xb9\ .........."

lowtronik avatar Dec 09 '17 09:12 lowtronik

@lowtronik that hex format. just decode it result.decode("utf-8", "replace")

ShuvenduRoy avatar Dec 09 '17 09:12 ShuvenduRoy

@ShuvenduBikash I just deleted .encode('utf-8') and it works

lowtronik avatar Dec 09 '17 19:12 lowtronik

I have the same problem, it generates raw text like this

\xc3\xa8p

However if I follow your suggestion and delete .encode('utf-8') it fails with this error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u201c' in position 444: ordinal not in range(128)

foocp avatar Feb 16 '18 13:02 foocp