bert_score
bert_score copied to clipboard
BERT simple true / false comparison scoring seems wrong
I am using a very basic call around a simple response of "true" and "false" variations, and am getting a very odd result that I don't quite understand.
Here's the code snippet that makes the BERT call
bert_score: Dict[str, Tensor] = score(
['false.'],
['false'],
model_type='microsoft/deberta-xlarge-mnli',
lang="en",
)
print(f'Score: {bert_score}')
The result for this call is:
Score: (tensor([0.7015]), tensor([0.7298]), tensor([0.7153]))
But here's the odd part, if I make the call with an input of ['True'], this actually scores higher:
bert_score: Dict[str, Tensor] = score(
['True'],
['false'],
model_type='microsoft/deberta-xlarge-mnli',
lang="en",
)
print(f'Score: {bert_score}')
result:
Score: (tensor([0.7599]), tensor([0.7599]), tensor([0.7599]))
This just seems flat out wrong to me, and am wondering if someone can give me insight into what is happening. Thanks.