nmt icon indicating copy to clipboard operation
nmt copied to clipboard

default embedding

Open nashid opened this issue 5 years ago • 5 comments
trafficstars

If we do not provide embedding like word2vec, how does it know to represent the words?

Does it use one hot encoding by default or ngram, CBOW, skip grams?

nashid avatar May 21 '20 01:05 nashid

No. If you do not provide the pretrained embeddings, it will create an trainable variable, and initialize it by some algorithm. When you train the model on your data, this variable will be updated too.

luozhouyang avatar Jun 06 '20 03:06 luozhouyang

@luozhouyang I understand if we do not provide the pre-trained embedding it uses the default implementation of embedding in this framework.

However, I would like to know what algorithm is used to build the embedding.

nashid avatar Jun 07 '20 23:06 nashid

Word embeddings here are actually an 2-d tensor, with shape (vocab_size, embedding_size). This tensor will be updated along with other params by BP.

luozhouyang avatar Jun 08 '20 00:06 luozhouyang

@luozhouyang I understand this. But what algorithm it is using (like word2vec, gloVe, ...)?

nashid avatar Jul 16 '21 19:07 nashid

No special algorithm is used. Not word2vec, not GloVe, just a learnable 2-d matrix.

luozhouyang avatar Jul 19 '21 01:07 luozhouyang