mirror-bert Results for English STS using BERT-mp

Results for English STS using BERT-mp

Open zhujiatong628 opened this issue 1 year ago • 2 comments

Hi, I'm reproducing your experiment recently, and I found when I evaluated BERT-mp for English STS. The results I got are much better than reported in the paper, so I also use your code provided at here to check. I found it's still better than paper shows. However, when I use your code to evaluate BERT-CLS for English STS, I can get the same results with the paper shows. So I was wondering is there something wrong with the results using BERT-mp for English STS? results

Mar 06 '23 15:03 zhujiatong628

Hi, if you check out https://github.com/cambridgeltl/mirror-bert/blob/8b6b9de97f6e9ba62310949240f1e08556887784/evaluation/eval.py#L19, there are two options for mp: mean and mean_std where the first one considers padding tokens too. I used mean for reporting results which might have caused the discrepancy.

Mar 06 '23 15:03 hardyqr

Thanks for your reply. I also used mean, so I'm so confused about the difference.😫

Mar 06 '23 15:03 zhujiatong628

mirror-bert mirror-bert copied to clipboard

Results for English STS using BERT-mp

mirror-bert
mirror-bert copied to clipboard