paraphraser icon indicating copy to clipboard operation
paraphraser copied to clipboard

Could you point me to glove.6B.300d.pickle?

Open aerinkim opened this issue 5 years ago • 2 comments

Usually, it's glove.6B.300d.txt but I think you did some preprocessing here. I'd appreciate if you could share how you pickled it.

aerinkim avatar Oct 24 '18 22:10 aerinkim

This is how I generated it:

import pickle
import numpy as np

f = open('glove.6B.300d.txt', 'r')
g = open('glove.6B.300d_pickle', 'wb')
word_dict = {}
wordvec = []
for idx, line in enumerate(f.readlines()):
    word_split = line.split(' ')
    word = word_split[0]
    word_dict[word] = idx
    d = word_split[1:]
    d[-1] = d[-1][:-1]
    d = [float(e) for e in d]
    wordvec.append(d)

embedding = np.array(wordvec)
pickling = {}
pickling = {'embedding' : embedding, 'word_dict': word_dict}
pickle.dump(pickling, g)
f.close()
g.close()

rjadr avatar Apr 13 '19 06:04 rjadr

@rjadr To use it with the paraphraser code, you need some changes:

import pickle
import numpy as np

f = open('glove.6B.300d.txt', 'r')
g = open('glove.6B.300d.pickle', 'wb')

word_to_id = {}
id_to_word = {}

wordvec = []

for idx, line in enumerate(f.readlines()):

    word_split = line.split(' ')
    word = word_split[0]
    word_to_id[word] = idx
    id_to_word[idx] = word

    d = word_split[1:]
    d[-1] = d[-1][:-1]
    d = [float(e) for e in d]
    wordvec.append(d)

embedding = np.array(wordvec)

pickling = word_to_id, id_to_word, embedding

pickle.dump(pickling, g)

f.close()
g.close()

ott-fogliata avatar Jul 05 '19 14:07 ott-fogliata