tutorials
tutorials copied to clipboard
seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec
Hi, Thank you for your tutorial! I tried to change the embedding with pre-trained word embeddings such as word2vec, here is my code:
class Lang:
def __init__(self, name):
self.name = name
self.word2index = {}
self.word2count = {}
self.index2word = {0: "SOS", 1: "EOS"}
self.n_words = 2 # Count SOS and EOS
def get_word2vec(self):
word2vec = KeyedVectors.load_word2vec_format('Models/Word2Vec/wiki.he.vec')
return word2vec
def addSentence(self, sentence):
for word in sentence.split(' '):
self.addWord(word)
def addWord(self, word):
if word not in self.word2index:
self.word2index[word] = self.get_word2vec[word]
self.word2count[word] = 1
self.index2word[self.n_words] = word
self.n_words += 1
else:
self.word2count[word] += 1
the dimension size of this word2vec is 300 dimensions Is I need to change other things in my Encoder?
Thank you!
Yeah I'm trying to train with word2vec. Word2vec can be either 100d, 200d, 300d vector i.e 1d array with 100 values for each word for 100d model
Can anyone help me where should I change the dimension values. for eg: what values should be replaced in below lines: self.embedding(input).view(1, 1, -1) return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)
@Liranbz Did you get sorted out
@NarenInD @Liranbz have you found the solution? I have been also looking for the same. Thank you.
torchtext currently supports pretrained GloVe, FastText, and CharNGram embeddings. Other embeddings can be loaded using torchtext.vocab.Vectors
. If anyone is interested, I can edit the tutorial to show how you could use those.