geval
geval copied to clipboard
Fluency outcome different from prompt instructions
Fluency is the only score that is rated 1-3 instead of 1-5 as the others as per the prompt instructions. The output in the summeval.json file however indicates that fluency is consistently rated on a 1-5 scale, not following the prompt instructions. Furthermore, averaging an overall score on a 1-5 scale can be misleading, since a fluency score of 1-3 will still bring the overall score below 5.
My suggestion would be to update the fluency score to become 1-5 and update the prompt accordingly.