sent-conv-torch icon indicating copy to clipboard operation
sent-conv-torch copied to clipboard

utf-8 decoding of Google word2vec embeddings incorrect

Open gargrohin opened this issue 5 years ago • 0 comments

In preprocess.py, function load_bin_vec : the line ch = f.read(1) is incorrect, atleast for Python3. Need to add .decode("utf-8") for correctly decoding characters from GoogleNews-vectors-negative300.bin

Also, need to convert the code to Python3 for 2020 use.

gargrohin avatar Jan 26 '20 19:01 gargrohin