dont-stop-pretraining Add readme for the mlm study

Add readme for the mlm study

Open amarasovic opened this issue 4 years ago • 0 comments

We want to report issues that could affect the reproducibility of the masked LM loss calculation at test time.

First, we do not get exactly the same results reported in Table 3 of paper when we use the fairseq library instead of the transformers library, after we convert the transformers checkpoint to a fairseq checkpoint.

A related pull request was opened and closed, but did not fix our problem. Second, the results in Table 3 are calculated using the batch size of 1. With the batch sizes larger than 1, we do not get the same results. In particular, the results change for a sample of reviews. As we have already mentioned, reviews are much shorter than documents from other domains. Therefore, unlike documents in other domains that are usually of the maximum length, reviews need to be padded to the maximum length. For this reason, we suspect that padding somehow influences the masked LM loss calculation. However, with the batch size of 1 we do not need to pad, and therefore we find results in Table 3 reliable.

Apr 21 '20 18:04 amarasovic

dont-stop-pretraining dont-stop-pretraining copied to clipboard

Add readme for the mlm study

dont-stop-pretraining
dont-stop-pretraining copied to clipboard