interpret_bert
interpret_bert copied to clipboard
WC task bert accuracy is much lower than paper claimed
I have tried use the SentEval WC dataset to evaluate Bert performance. But the result is much lower than the paper claimed. (0.4 compares to 24.9 in paper for layer 0). I achieve similar performance on all other 9 tasks.
It's about one month and still no one reply ... The issue is relatively critical because I have tried different language models with current codebase and none of them has a reasonable result on this one. Is there any reason behind that? I am very grateful if someone can reply me.