BARTScore icon indicating copy to clipboard operation
BARTScore copied to clipboard

Spearman Corrleations for Table-4

Open Atharva-Phatak opened this issue 2 years ago • 3 comments

In Table-4 in the paper, for summEval dataset you have measured COH, FAC, FLU, INFO. I wanted to know which variants of bart-score you used.

From my understanding of the paper, For factuality(FAC) you must have used BARTScore(s->h) i.e source -> hypothesis.

But i am not clear about FLU, COH and INFO.

If you could please elaborate that will be really helpful.

image

Atharva-Phatak avatar Jul 09 '22 20:07 Atharva-Phatak

On the SummEval dataset, for FLU, COH and INFO, we also used BARTScore(s->h).

yyy-Apple avatar Jul 10 '22 03:07 yyy-Apple

So what was the reason for using single score (s->h). Does BARTScore holistically measure quality of generated text ?

For example can you report s->h variant of BARTScore and say that overall from the basis of the score, the quality of Text Summary generated by Model A is better than Model B ?

Also how do you decide which BARTScore variant to use for a particular dataset to measure COH, FLU, INFO and FAC ?

Please let me know.

Atharva-Phatak avatar Jul 10 '22 14:07 Atharva-Phatak

Here are some rules we have followed when deciding which BARTScore variant to use.

  • based on the definition of the evaluation perspective (for example, factuality must rely on the source document.)
  • modalities/languages supported by PLMs (for example, for Data-to-text, we can only use the h<->r due to the different modalities of source and hypothesis)

However, we agree that designing a metric with multiple interpretable dimensions will be a promising future work.

yyy-Apple avatar Jul 14 '22 01:07 yyy-Apple