BERT-pytorch
BERT-pytorch copied to clipboard
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68, for every single token, it has 15% of chance that go though the followup procedure. Does it aligned with 15% of token will be chosen?
Sorry for the late response, I think you are right. I'll fix it ASAP