sacrerouge
sacrerouge copied to clipboard
PythonRouge does not have a scoring option
The original ROUGE script allows for a scoring option: -f A|B
where A
means to average over the models and B
takes the maximum. Similar functionality should be implemented for PythonRouge
. The logic should be identical to ROUGE, so we need to understand the implementation details (how does it compute the "best" model? Is it per metric or does it pick one of the metrics and use that for precision, recall, and f1?)