sent-conv-torch
sent-conv-torch copied to clipboard
utf-8 decoding of Google word2vec embeddings incorrect
In preprocess.py
, function load_bin_vec
: the line ch = f.read(1)
is incorrect, atleast for Python3.
Need to add .decode("utf-8")
for correctly decoding characters from GoogleNews-vectors-negative300.bin
Also, need to convert the code to Python3 for 2020 use.