training-word2vec
training-word2vec copied to clipboard
Added UTF-8 encoding when opening files.
Hello! I was trying to train my own vector model, but got decoding errors during the linebreaks removal. I figured out that forcing the encoding (encoding="utf-8") during the file opening would fix it. Added to both input and output files. Also, added "ensure_ascii=False" to the json dump, so the "\u" characters are dumped as human readable chars.
Hope this helps, and thanks for the great work!