haystack
haystack copied to clipboard
Improve no_answer values when `use_confidence_scores=True` in `FARMReader`
Is your feature request related to a problem? Please describe.
In #2853 it turned out that when setting use_confidence_scores=True
we cannot expect the same ranking as with use_confidence_scores=False
:
- transforming regular answer scores to confidence values keeps the order within regular answers
- transforming no_answer scores to confidence values often produces a different order
This is because calculating the confidence scores for no_answers does not take the original per-sequence softmaxed logits of the model output into account but simple scales the per-query no_answer logits score via sigmoid: https://github.com/deepset-ai/haystack/blob/1f5b9bd69b42209a2f276ba848988243253e9bc7/haystack/nodes/reader/base.py#L33-L55
This produces some funny situations: even though the model calculated a per-sequence confidence score of about 1% the returned confidence score could be above 50-60% (even if only one sequence was considered).
Additional context The calculation of the original per-sequence confidence value within the model is done here: https://github.com/deepset-ai/haystack/blob/1f5b9bd69b42209a2f276ba848988243253e9bc7/haystack/modeling/model/prediction_head.py#L553
The calculation of per-document confidence value within the model is done here. This might be an option to be used by FARMReader if we propagate the required values accordingly: https://github.com/deepset-ai/haystack/blob/1f5b9bd69b42209a2f276ba848988243253e9bc7/haystack/modeling/model/prediction_head.py#L807
Describe the solution you'd like
- Specifying
use_confidence_scores
should not change the ranking or at least the chance of such should be minimal - FARMReader should not heavily inflate low per-sequence no_answer scores. TODO
Describe alternatives you've considered TODO
Additional context Per-sequence softmaxed logits could be propagated from the FARM model to FARMReader. Not so for TransformersReader, however.
/cc @julian-risch