关于预训练词向量加载报错
在language model中,看到要加载word2vec.6B.100d这个预训练模型,我使用的是glove.6B.50d,但是会报错。求解
Traceback (most recent call last): File "D:/DesktopBackup/right/MLHomework/AllenNLP/[NLP]Pytorch17_torchTextDemo.py", line 75, in
wvmodel = gensim.models.KeyedVectors.load_word2vec_format(r'D:\DesktopBackup\right\MLHomework\AllenNLP\data\glove.6B.50d.txt', binary=False, encoding='utf-8') File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 1476, in load_word2vec_format limit=limit, datatype=datatype) File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\utils_any2vec.py", line 344, in _load_word2vec_format vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\utils_any2vec.py", line 344, in vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format ValueError: invalid literal for int() with base 10: 'the'
word2vec和glove的格式不同,你需要将glove转化为word2vec的格式,gensim有这个功能。