training-word2vec Added UTF-8 encoding when opening files.

Added UTF-8 encoding when opening files.

Open rodjuncode opened this issue 5 years ago • 0 comments

Hello! I was trying to train my own vector model, but got decoding errors during the linebreaks removal. I figured out that forcing the encoding (encoding="utf-8") during the file opening would fix it. Added to both input and output files. Also, added "ensure_ascii=False" to the json dump, so the "\u" characters are dumped as human readable chars.

Hope this helps, and thanks for the great work!

Aug 04 '20 18:08 rodjuncode

training-word2vec training-word2vec copied to clipboard

Added UTF-8 encoding when opening files.

training-word2vec
training-word2vec copied to clipboard