tutorials seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec

seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec

Open Liranbz opened this issue 4 years ago • 3 comments

Hi, Thank you for your tutorial! I tried to change the embedding with pre-trained word embeddings such as word2vec, here is my code:

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def get_word2vec(self):
        word2vec = KeyedVectors.load_word2vec_format('Models/Word2Vec/wiki.he.vec')
        return word2vec
    
    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.get_word2vec[word]
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

the dimension size of this word2vec is 300 dimensions Is I need to change other things in my Encoder?

Thank you!

Jul 16 '20 10:07 Liranbz

Yeah I'm trying to train with word2vec. Word2vec can be either 100d, 200d, 300d vector i.e 1d array with 100 values for each word for 100d model

Can anyone help me where should I change the dimension values. for eg: what values should be replaced in below lines: self.embedding(input).view(1, 1, -1) return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)

@Liranbz Did you get sorted out

Jul 27 '20 12:07 NarenInD

@NarenInD @Liranbz have you found the solution? I have been also looking for the same. Thank you.

Aug 22 '22 17:08 ivrschool

torchtext currently supports pretrained GloVe, FastText, and CharNGram embeddings. Other embeddings can be loaded using torchtext.vocab.Vectors. If anyone is interested, I can edit the tutorial to show how you could use those.

Jun 04 '23 05:06 QasimKhan5x

tutorials tutorials copied to clipboard

seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec

tutorials
tutorials copied to clipboard