wordvectors
wordvectors copied to clipboard
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte
Hi,
I am trying to load Chinese pretrained word2vec, word_vectors = KeyedVectors.load_word2vec_format(path, binary=True) # C binary format
it throws this error.
of cause the vector should be trained using the proper codec, it seems the model is trained in other coding environment. Can you check that.
I have come across the same error, anybody help? Thank you ~
I came across the same error as well. I changed:
word_vectors = KeyedVectors.load_word2vec_format(path, binary=True)
into
word_vectors = KeyedVectors.load(path)
It turns out that load_word2vec_format
is used when we're trying to load word vectors that are trained using the original implementation of word2vec (in C). Since these pre-trained word vectors are trained using Python (gensim), we can use load
instead.
@galuhsahid Thank you so much, it works now. : )
I have tried to read the files as you pointed, but I got the next error:
File "C:\ProgramData\Anaconda2\lib\site-packages\gensim\models\base_any2vec.py", line 380, in syn1neg
self.trainables.syn1neg = value
AttributeError: 'Word2Vec' object has no attribute 'trainables'
:(
Same error as @anavaldi . Any solution?
I solve this error by executing on my own word embeddings with the .sh file.
I have come across the same error. I changed gensim.models.KeyedVectors.load_word2vec_format()
into gensim.models.Word2Vec.load()
.Then it works
@hinamu it works, Thanks
@anavaldi
I solve this error by executing on my own word embeddings with the .sh file.
What do you mean?
I have tried to read the files as you pointed, but I got the next error:
File "C:\ProgramData\Anaconda2\lib\site-packages\gensim\models\base_any2vec.py", line 380, in syn1neg self.trainables.syn1neg = value AttributeError: 'Word2Vec' object has no attribute 'trainables'
:(
I solved this issue by degrading my gensim version from 3.6 to 3.0
UnpicklingError Traceback (most recent call last)
@kusumlata123 even i am getting that Unpickling Error
I am also getting the unpickling error... Any ideas? My code is:
chinese_model = gensim.models.Word2Vec.load(os.path.join(desktop, 'cc.zh.300.bin.gz'))
I also tried to save the text file and load it via the function provided by the fasttext official site. I first change the file extension from gz
to txt
and use the following functions:
import io
def load_vectors(fname):
fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
n, d = map(int, fin.readline().split())
data = {}
for line in fin:
tokens = line.rstrip().split(' ')
data[tokens[0]] = map(float, tokens[1:])
return data
model = load_vectors(os.path.join(desktop, 'cc.zh.300.vec.txt'))
However, I got the following errors:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-d67f52bde947> in <module>
----> 1 model = load_vectors(os.path.join(desktop, 'cc.zh.300.vec.txt'))
<ipython-input-3-0f69b5ce62b8> in load_vectors(fname)
1 def load_vectors(fname):
2 fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
----> 3 n, d = map(int, fin.readline().split())
4 data = {}
5 for line in fin:
ValueError: invalid literal for int() with base 10: '\x08\x08p[\x00\x03cc.zh.300.vec\x00\\ͮfMr7?W3ۀ0|Szдl\x14I\x132'
I tried the above solution but I am getting error as: UnpicklingError: invalid load key, '\x1f' My code: from gensim import models
word2vec_path = 'GoogleNews-vectors-negative300.bin.gz.2' word2vec = models.KeyedVectors.load(word2vec_path)
I came across the same error as well. I changed:
word_vectors = KeyedVectors.load_word2vec_format(path, binary=True)
into
word_vectors = KeyedVectors.load(path)
It turns out that
load_word2vec_format
is used when we're trying to load word vectors that are trained using the original implementation of word2vec (in C). Since these pre-trained word vectors are trained using Python (gensim), we can useload
instead.
When I tried this , I am getting : UnpicklingError: unpickling stack underflow
I came across the same error as well. I changed:
word_vectors = KeyedVectors.load_word2vec_format(path, binary=True)
intoword_vectors = KeyedVectors.load(path)
It turns out thatload_word2vec_format
is used when we're trying to load word vectors that are trained using the original implementation of word2vec (in C). Since these pre-trained word vectors are trained using Python (gensim), we can useload
instead.When I tried this , I am getting :
UnpicklingError: unpickling stack underflow
For Korean language, i got this error: 'AttributeError: Can't get attribute 'Vocab' on <module 'gensim.models.word2vec' from 'C:\Users\ductr\Python\lib\site-packages\gensim\models\word2vec.py'>' Would you mind letting me know what the error is?
I tried the above solution but I am getting error as: UnpicklingError: invalid load key, '\x1f' My code: from gensim import models
word2vec_path = 'GoogleNews-vectors-negative300.bin.gz.2' word2vec = models.KeyedVectors.load(word2vec_path)
I get the same error after using:
from gensim.models import Word2Vec
from gensim.models.keyedvectors import KeyedVectors
model = Word2Vec.load(model_path)
What am I doing wrong?