haystack icon indicating copy to clipboard operation
haystack copied to clipboard

no_answer_score scaling is redundant

Open julian-risch opened this issue 2 years ago • 1 comments

Currently, the no_answer scores use an expit(score/8) function to be scaled to the interval 0 to 1. However, the score should be between 0 and 1 even before that because of the softmax function that we use to calculate confidence scores. Thus, the expit(score/8) function could probably be removed.

Further, the if/else block can be simplified because the same code is executed in both cases: no_ans_score = best_score_answer - max_no_ans_gap

julian-risch avatar May 18 '22 12:05 julian-risch

Hey @julian-risch! I suppose that you are talking about this method: https://github.com/deepset-ai/haystack/blob/96bb9b5905dcef61264f40ead6eea1ecaf2991b7/haystack/nodes/reader/base.py#L33-L63

  • (If-else block was removed with #2842)
  • I'm not sure that the scaling is redundant: based on my experiments, no_ans_score might not be in the interval 0 to 1, even if we are using confidence scores.

However, perhaps I did not understand the problem or the proposal well. Please let me know...

anakin87 avatar Sep 10 '22 15:09 anakin87

@julian-risch can you follow up here?

masci avatar Nov 08 '22 13:11 masci

@anakin87 Hi Stefano, yes, you're right that the if-else block has been removed already. 👍 Regarding redundancy of the scaling: in the prediction head we get scores from a matrix of logits, which are not in the interval of 0 to 1 here: https://github.com/deepset-ai/haystack/blob/3319ef6d1c8f0b8a4d80d0f531c6ea0fd1c02e6e/haystack/modeling/model/prediction_head.py#L536 Confidence is calculated based on a different matrix containing the softmax of logits so it is already in the interval of 0 to 1 here: https://github.com/deepset-ai/haystack/blob/3319ef6d1c8f0b8a4d80d0f531c6ea0fd1c02e6e/haystack/modeling/model/prediction_head.py#L538 And the no_answer_score uses the former matrix with the logits: https://github.com/deepset-ai/haystack/blob/3319ef6d1c8f0b8a4d80d0f531c6ea0fd1c02e6e/haystack/modeling/model/prediction_head.py#L565 So I agree that the no_answer_score isn't scaled already and there is no redundancy. This issue can therefore be closed. Thank you that you ran some experiments. That's the best way to find out quickly. 👍

julian-risch avatar Nov 09 '22 17:11 julian-risch