Data-Science-Regular-Bootcamp icon indicating copy to clipboard operation
Data-Science-Regular-Bootcamp copied to clipboard

Masked LM (MLM)

Open imsanjoykb opened this issue 3 years ago • 0 comments

Before feeding word sequences into BERT, 15% of the words in each sequence are replaced with a [MASK] token. The model then attempts to predict the original value of the masked words, based on the context provided by the other, non-masked, words in the sequence. In technical terms, the prediction of the output words requires: Adding a classification layer on top of the encoder output. Multiplying the output vectors by the embedding matrix, transforming them into the vocabulary dimension. Calculating the probability of each word in the vocabulary with softmax.

imsanjoykb avatar Sep 27 '21 17:09 imsanjoykb