what's the probability threshold of ContextualWordEmbsAug?
Hello! this repository helped me a lot! When using ContextualWordEmbsAug (such as BERT), [MASK] a word in the input text and replace it using Masked Language Model. And I'm wondering what's the probability threshold of ContextualWordEmbsAug? Maybe probability threshold exceeds 80%, 60% or other probability, then the [MASK] will be replaced by the candidate word? thank you!^_^
There is no threshold. As long as it is replaced by [MASK], it will replace by any other word candidates unless there are no candidates.
There is no threshold. As long as it is replaced by [MASK], it will replace by any other word candidates unless there are no candidates.
Hi Edward!
Thanks for your contribution, and it really helped me a lot!
A question about the substitution mechanism implemented in ContextualWordEmbsAug. Suppose we are going to substitute half of the tokens in a sentence (i.e., aug_p=0.5), does it mean that half of the tokens will be replaced by [MASK] at the same time and substituted simultaneously? OR, only one token will be replaced each time and we repeat this process until half of the tokens are replaced?
Thank you!
NLPAug uses the second approach which is replacing tokens one by one. The first approach provides a much faster response time but it does make sense (at least I believe). As we will use [MASK] to predict another [MASK]. For example, The input can be the quick [MASK] [MASK] jumps over the lazy [MASK]
Here is persudo that NLPAug is using:
Raw Input: the quick brown fox jumps over the lazy dog Target tokens: brown, fox, dog.
First Iteration Input: the quick brown fox jumps over the lazy [MASK] Output: the quick brown fox jumps over the lazy cat.
Second Iteration Input: the quick brown [MASK] jumps over the lazy cat. Output: the quick brown tiger jumps over the lazy cat.
Third Iteration Input: the quick [MASK] tiger jumps over the lazy cat. Output: the quick red tiger jumps over the lazy cat.
NLPAug uses the second approach which is replacing tokens one by one. The first approach provides a much faster response time but it does make sense (at least I believe). As we will use [MASK] to predict another [MASK]. For example, The input can be the quick [MASK] [MASK] jumps over the lazy [MASK]
Here is persudo that NLPAug is using:
Raw Input: the quick brown fox jumps over the lazy dog Target tokens: brown, fox, dog.
First Iteration Input: the quick brown fox jumps over the lazy [MASK] Output: the quick brown fox jumps over the lazy cat.
Second Iteration Input: the quick brown [MASK] jumps over the lazy cat. Output: the quick brown tiger jumps over the lazy cat.
Third Iteration Input: the quick [MASK] tiger jumps over the lazy cat. Output: the quick red tiger jumps over the lazy cat.
Thanks for your prompt reply. It's clear now!