BARTScore Spearman Corrleations for Table-4

Spearman Corrleations for Table-4

Open Atharva-Phatak opened this issue 3 years ago • 3 comments

In Table-4 in the paper, for summEval dataset you have measured COH, FAC, FLU, INFO. I wanted to know which variants of bart-score you used.

From my understanding of the paper, For factuality(FAC) you must have used BARTScore(s->h) i.e source -> hypothesis.

But i am not clear about FLU, COH and INFO.

If you could please elaborate that will be really helpful.

Jul 09 '22 20:07 Atharva-Phatak

On the SummEval dataset, for FLU, COH and INFO, we also used BARTScore(s->h).

Jul 10 '22 03:07 yyy-Apple

So what was the reason for using single score (s->h). Does BARTScore holistically measure quality of generated text ?

For example can you report s->h variant of BARTScore and say that overall from the basis of the score, the quality of Text Summary generated by Model A is better than Model B ?

Also how do you decide which BARTScore variant to use for a particular dataset to measure COH, FLU, INFO and FAC ?

Please let me know.

Jul 10 '22 14:07 Atharva-Phatak

Here are some rules we have followed when deciding which BARTScore variant to use.

based on the definition of the evaluation perspective (for example, factuality must rely on the source document.)
modalities/languages supported by PLMs (for example, for Data-to-text, we can only use the h<->r due to the different modalities of source and hypothesis)

However, we agree that designing a metric with multiple interpretable dimensions will be a promising future work.

Jul 14 '22 01:07 yyy-Apple

BARTScore BARTScore copied to clipboard

Spearman Corrleations for Table-4

BARTScore
BARTScore copied to clipboard