nlp-tutorial icon indicating copy to clipboard operation
nlp-tutorial copied to clipboard

Some problems about Bert

Open tfighting opened this issue 5 years ago • 2 comments

line 70: index = randint(0, vocab_size - 1) # random index in vocabulary. I think the replace index can't involve 'cls' ,'sep' and 'mask'!

tfighting avatar Nov 06 '19 08:11 tfighting

line 70: index = randint(0, vocab_size - 1) # random index in vocabulary. I think the replace index can't involve 'cls' ,'sep' and 'mask'!

Yes, it`s right. so the code should change like this :

if random() < 0.8:  # 80%
    input_ids[pos] = word_dict['[MASK]']  # make mask
elif random() > 0.9:
    index = randint(0, vocab_size - 1)
    while index < 4: # cause {'[PAD]': 0, '[CLS]': 1, '[SEP]': 2, '[MASK]': 3} are all  meanless
        index = randint(0, vocab_size - 1)
    input_ids[pos] = index

bruce1408 avatar Mar 26 '21 22:03 bruce1408

How about just : index = randint(4, vocab_size - 1)

lukysummer avatar Mar 08 '22 05:03 lukysummer