swords
swords copied to clipboard
legacy metric 'Best' are inaccurate
Thanks for your code and contribution. I've conducted some testing on the legacy metrics, and it appears there might be an issue with the legacy metric-'best' score. When testing with the 'LS07' test data utilizing bert-ls model with eval process, I obtained a 'best' score of 1.12, whereas with the 'ls14' test data, the 'best' score was 1.18. This is significantly different from the previous results. I believe there might be an issue with the metric calculations.