legacy metric 'Best' are inaccurate

Open Hyfred opened this issue 2 years ago • 0 comments

Thanks for your code and contribution. I've conducted some testing on the legacy metrics, and it appears there might be an issue with the legacy metric-'best' score. When testing with the 'LS07' test data utilizing bert-ls model with eval process, I obtained a 'best' score of 1.12, whereas with the 'ls14' test data, the 'best' score was 1.18. This is significantly different from the previous results. I believe there might be an issue with the metric calculations.

Sep 17 '23 19:09 Hyfred