practical-pytorch icon indicating copy to clipboard operation
practical-pytorch copied to clipboard

seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec

Open Liranbz opened this issue 5 years ago • 0 comments

Hi, Thank you for your tutorial! I tried to change the embedding with pre-trained word embeddings such as word2vec, here is my code:

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def get_word2vec(self):
        word2vec = KeyedVectors.load_word2vec_format('Models/Word2Vec/wiki.he.vec')
        return word2vec
    
    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.get_word2vec[word]
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

the dimension size of this word2vec is 300 dimensions Is I need to change other things in my Encoder?

Thank you!

Liranbz avatar Jul 16 '20 12:07 Liranbz