Structured-Self-Attentive-Sentence-Embedding icon indicating copy to clipboard operation
Structured-Self-Attentive-Sentence-Embedding copied to clipboard

About GLOVE model

Open jx00109 opened this issue 7 years ago • 1 comments

Recently, I have use torchtext to get the glove model, By this module I got the dictionary that maps word to index and the embedding matrix (shape word_count * dim, torch.FloatTensor), so to create the file which can be used in train.py, I write my code like this:

t=(dictionary, embedding matrix, dim)
torch.save(t, mypath/glove.pt)

Is the file glove.pt in the right format that asked in your program?

jx00109 avatar Nov 08 '17 07:11 jx00109

This is how I created the GloVe model :


TEXT = data.Field(sequential=True) 
LABEL = data.Field(sequential=False)

train, val, test = data.TabularDataset.splits(
        path='./', train='train.json',
        validation='val.json', test='test.json', format='json',
        fields={'text': ('text', TEXT),
             'label': ('label', LABEL)})

TEXT.build_vocab(train, vectors="glove.42B.300d")

dictionary = TEXT.vocab.stoi
vectors = TEXT.vocab.vectors
dim = TEXT.vocab.vectors.size()[1] #300 in this case

torch.save(tuple([dictionary,vectors,dim]), './GloVe/glove.42B.300d.pt')

Took inspiration from :

  • http://anie.me/On-Torchtext/

  • Lines 24-31 of https://github.com/pytorch/examples/blob/master/snli/train.py

DenisDsh avatar Jun 21 '18 21:06 DenisDsh