mirror-bert
mirror-bert copied to clipboard
Results for English STS using BERT-mp
Hi, I'm reproducing your experiment recently, and I found when I evaluated BERT-mp for English STS. The results I got are much better than reported in the paper, so I also use your code provided at here to check. I found it's still better than paper shows. However, when I use your code to evaluate BERT-CLS for English STS, I can get the same results with the paper shows. So I was wondering is there something wrong with the results using BERT-mp for English STS?
Hi, if you check out https://github.com/cambridgeltl/mirror-bert/blob/8b6b9de97f6e9ba62310949240f1e08556887784/evaluation/eval.py#L19, there are two options for mp: mean
and mean_std
where the first one considers padding tokens too. I used mean
for reporting results which might have caused the discrepancy.
Thanks for your reply. I also used mean, so I'm so confused about the difference.😫