training-word2vec icon indicating copy to clipboard operation
training-word2vec copied to clipboard

Added UTF-8 encoding when opening files.

Open rodjuncode opened this issue 5 years ago • 0 comments

Hello! I was trying to train my own vector model, but got decoding errors during the linebreaks removal. I figured out that forcing the encoding (encoding="utf-8") during the file opening would fix it. Added to both input and output files. Also, added "ensure_ascii=False" to the json dump, so the "\u" characters are dumped as human readable chars.

Hope this helps, and thanks for the great work!

rodjuncode avatar Aug 04 '20 18:08 rodjuncode