guided_summarization Guidance ROUGE

Guidance ROUGE

Open yumoxu opened this issue 3 years ago • 2 comments

Hi @zdou0830!

I just measured ROUGE scores for the guidance sentences you provided, and got the following result:

R1/R2/RL: 43.89/20.63/39.79

This a bit different from the MatchSum result reported in the paper:

R1/R2/RL: 44.41/20.86/40.55

I used files2rouge for ROUGE evaluation as suggested in Bart for summarization, with the default setting of -c 95 -r 1000 -n 2 -a.

Could you please check this result on your end? Maybe I missed something.

Thank you!

Apr 05 '21 22:04 yumoxu

Hi @yumoxu,

Thanks! For this case, I think the difference is mainly because when evaluating abstractive models, files2rouge would treat "." as sentence boundaries, but when evaluating extractive models, we viewed each extracted sentence as one sentence (no matter whether it ends with "." and it is OK to have multiple "." in one sentence). The results are directly taken from MatchSum's paper and I'll double-check that.

Apr 05 '21 22:04 zdou0830

Thanks for your swift reply!

Just want to confirm there is nothing wrong with my evaluation setup :) And this makes a lot of sense!

Apr 05 '21 23:04 yumoxu

guided_summarization guided_summarization copied to clipboard

Guidance ROUGE

guided_summarization
guided_summarization copied to clipboard