haystack Improve no_answer values when `use_confidence

Improve no_answer values when `use_confidence_scores=True` in `FARMReader`

Open tstadel opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe. In #2853 it turned out that when setting use_confidence_scores=True we cannot expect the same ranking as with use_confidence_scores=False:

transforming regular answer scores to confidence values keeps the order within regular answers
transforming no_answer scores to confidence values often produces a different order

This is because calculating the confidence scores for no_answers does not take the original per-sequence softmaxed logits of the model output into account but simple scales the per-query no_answer logits score via sigmoid: https://github.com/deepset-ai/haystack/blob/1f5b9bd69b42209a2f276ba848988243253e9bc7/haystack/nodes/reader/base.py#L33-L55

This produces some funny situations: even though the model calculated a per-sequence confidence score of about 1% the returned confidence score could be above 50-60% (even if only one sequence was considered).

Additional context The calculation of the original per-sequence confidence value within the model is done here: https://github.com/deepset-ai/haystack/blob/1f5b9bd69b42209a2f276ba848988243253e9bc7/haystack/modeling/model/prediction_head.py#L553

The calculation of per-document confidence value within the model is done here. This might be an option to be used by FARMReader if we propagate the required values accordingly: https://github.com/deepset-ai/haystack/blob/1f5b9bd69b42209a2f276ba848988243253e9bc7/haystack/modeling/model/prediction_head.py#L807

Describe the solution you'd like

Specifying use_confidence_scores should not change the ranking or at least the chance of such should be minimal
FARMReader should not heavily inflate low per-sequence no_answer scores. TODO

Describe alternatives you've considered TODO

Additional context Per-sequence softmaxed logits could be propagated from the FARM model to FARMReader. Not so for TransformersReader, however.

/cc @julian-risch

Jul 28 '22 18:07 tstadel

haystack haystack copied to clipboard

Improve no_answer values when `use_confidence_scores=True` in `FARMReader`

haystack
haystack copied to clipboard