GloVe icon indicating copy to clipboard operation
GloVe copied to clipboard

How can i select a certain word vector?

Open mandal4 opened this issue 5 years ago • 1 comments

Hi, i'm newbie for NLP but i'd like to select some category's word vector. I want word vectors of MS-COCO class's name, such as 'Person', 'Bus', 'Bird'... I downloaded pretrained file but i found there is no description about categorical label, and couldn't find how can i select the certain word vector.

Anyone could help me..?

mandal4 avatar Jan 29 '20 02:01 mandal4

If you have downloaded a pre-trained file eg. glove.42B.300d.txt or any other glove vectors. One way to extract the certain word vectors is to use the scripts.glove2word2vec from Gensim.

from gensim.test.utils import datapath, get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec

glove_file = datapath('glove.42B.300d.txt')
tmp_file = get_tmpfile("glove_test_word2vec.txt")

_ = glove2word2vec(glove_file, tmp_file)

 model = KeyedVectors.load_word2vec_format(tmp_file)

This script allows to convert GloVe vectors into the word2vec.

Now, use model['key'] to get your desired word vectors.

eg.

model['person']
# array([ 9.6294e-02,  7.3925e-01, -4.1032e-01,....],dtype=float32) 
# type: numpy.ndarray

JaganKaartik avatar Feb 02 '20 18:02 JaganKaartik