bert_score icon indicating copy to clipboard operation
bert_score copied to clipboard

Question regarding semantic correctness of sentences

Open rphlstck opened this issue 2 years ago • 0 comments
trafficstars

Excuse me if this is not the right venue to ask this question, but maybe your expertise could help me out!

Consider following candidates: 'My left ear hurts but my right eye is good' and 'My left eye is good but my right ear hurts.'

Now, I calculate the BERTScore to my ground truth: 'My right ear hurts but my left eye is good.'

I would expect that the second sentence yields a greater BERTScore as it more correctly captures the underlying information of the ground truth sentence. However, this would require the score to operate on a higher abstraction than on the token level (right?). Is there a way to adapt the BERTScore to achieve this? Or are you aware of a metric capable of capturing the meaning of a sentence?

These are the results I get:

Reference: My right ear hurts but my left eye is good.
Candidate:
My left ear hurts but my right eye is good.
microsoft/deberta-xlarge-mnli_L40_no-idf_version=0.3.12(hug_trans=4.33.2): P=0.966590 R=0.951585 F=0.959029
Candidate:
My left eye is good but my right ear hurts.
microsoft/deberta-xlarge-mnli_L40_no-idf_version=0.3.12(hug_trans=4.33.2): P=0.895558 R=0.891666 F=0.893608

This is my code:

from typing import List
from bert_score import score

def calc_score(refs: List, cands: List):
    (P, R, F), hashname = score(cands, refs, model_type='microsoft/deberta-xlarge-mnli', return_hash=True)
    print(f'Candidate:\n{cands[0]}')
    print(
        f"{hashname}: P={P.mean().item():.6f} R={R.mean().item():.6f} F={F.mean().item():.6f}"
    )

def sentence_ordering():
    '''
    Calculate the BERTScore for two sentences. 
    In the first sentence the meaning is altered by switching "left"/"right".
    In the second sentence the meaning is preserved but the order is switched.
    '''
    cands = ['My left ear hurts but my right eye is good.', 
             'My left eye is good but my right ear hurts.']
    refs = ['My right ear hurts but my left eye is good.']

    print(f'Reference: {refs[0]}')
    for cand in cands:
        calc_score(refs, [cand])

if __name__ == "__main__":
    sentence_ordering()

Note that ChatGPT3.5 will give following answer: Prompt:

Which of the following sentences (`Sentence 1` or `Sentence 2`) more accurately contains the information provided in the `Ground truth`?

Sentence 1: 'My left ear hurts but my right eye is good.'
Sentence 2: 'My left eye is good but my right ear hurts.'
Ground truth: 'My right ear hurts but my left eye is good.'

Answer:

Sentence 2 more accurately contains the information provided in the Ground truth. It correctly represents the information about the right ear hurting and the left eye being in good condition.

rphlstck avatar Sep 27 '23 14:09 rphlstck