bert_score BERT simple true / false comparison scoring seems wrong

BERT simple true / false comparison scoring seems wrong

Open james-deee opened this issue 1 year ago • 0 comments

I am using a very basic call around a simple response of "true" and "false" variations, and am getting a very odd result that I don't quite understand.

Here's the code snippet that makes the BERT call

        bert_score: Dict[str, Tensor] = score(
            ['false.'],
            ['false'], 
            model_type='microsoft/deberta-xlarge-mnli',
            lang="en",
        )
        print(f'Score: {bert_score}')

The result for this call is:

Score: (tensor([0.7015]), tensor([0.7298]), tensor([0.7153]))

But here's the odd part, if I make the call with an input of ['True'], this actually scores higher:

        bert_score: Dict[str, Tensor] = score(
            ['True'],
            ['false'], 
            model_type='microsoft/deberta-xlarge-mnli',
            lang="en",
        )
        print(f'Score: {bert_score}')

result:

Score: (tensor([0.7599]), tensor([0.7599]), tensor([0.7599]))

This just seems flat out wrong to me, and am wondering if someone can give me insight into what is happening. Thanks.

Feb 01 '24 13:02 james-deee

bert_score bert_score copied to clipboard

BERT simple true / false comparison scoring seems wrong

bert_score
bert_score copied to clipboard