GloVe
GloVe copied to clipboard
How can i select a certain word vector?
Hi, i'm newbie for NLP but i'd like to select some category's word vector. I want word vectors of MS-COCO class's name, such as 'Person', 'Bus', 'Bird'... I downloaded pretrained file but i found there is no description about categorical label, and couldn't find how can i select the certain word vector.
Anyone could help me..?
If you have downloaded a pre-trained file eg. glove.42B.300d.txt
or any other glove vectors.
One way to extract the certain word vectors is to use the scripts.glove2word2vec from Gensim.
from gensim.test.utils import datapath, get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec
glove_file = datapath('glove.42B.300d.txt')
tmp_file = get_tmpfile("glove_test_word2vec.txt")
_ = glove2word2vec(glove_file, tmp_file)
model = KeyedVectors.load_word2vec_format(tmp_file)
This script allows to convert GloVe vectors into the word2vec.
Now, use model['key']
to get your desired word vectors.
eg.
model['person']
# array([ 9.6294e-02, 7.3925e-01, -4.1032e-01,....],dtype=float32)
# type: numpy.ndarray