SDE-Net icon indicating copy to clipboard operation
SDE-Net copied to clipboard

How to let the model say 'I don't know'?

Open GuokaiLiu opened this issue 3 years ago • 2 comments

Hi Lingkai

Thanks for sharing the great code for the fantastic paper. I want to consult some questions if possible.

  • 1 How to let the model say 'I don't know'?

For OOD samples, it is wiser to let the model say ‘I don’t know’ instead of making an absurdly wrong predictions.

How can we do it after training SDE-Net for estimating uncertain examples, especially for unseen classes?

  • 2 What does 'detection accuracy' mean for OOD task in 'calculate_log.py'?

For 'mis' task, the detection error is easy to understand. For 'OOD' task, a trained SDE-net outputs the softmax values (without ground-truth information) for both in-domain examples and OOD examples. I'm confused about the detection accuracy of OOD task. Any math expression or description for OOD detection accuracy?

#calculate the minimum detection error
if task == 'OOD':
    cifar = np.loadtxt('%s/confidence_Base_In.txt'%dir_name, delimiter=',')
    other = np.loadtxt('%s/confidence_Base_Out.txt'%dir_name, delimiter=',')

Y1 = other
X1 = cifar
end = np.max([np.max(X1), np.max(Y1)])
start = np.min([np.min(X1),np.min(Y1)])
gap = (end- start)/200000

errorBase = 1.0
for delta in np.arange(start, end, gap):
    tpr = np.sum(np.sum(X1 < delta)) / np.float(len(X1))
    error2 = np.sum(np.sum(Y1 > delta)) / np.float(len(Y1))
    errorBase = np.minimum(errorBase, (tpr+error2)/2.0)
  • 3 What do the arrows in Figure 3 mean?

In Figure 3, both ID data and OOD data runs through f-net and g-net, merges with each other and finally outputs the predictions. It makes sense in the training process. However, the g-net seems useless in the test process? If we want to let the model evaluate its confidence/uncertainty for a specific example, should we use the g-net and set a threshold?

Some other potential questions may come later. Sincerely thanks for your kindly help : )

GuokaiLiu avatar Nov 13 '20 16:11 GuokaiLiu

Hi, thanks for you interest in our work.

  1. We can set a threshold. If the uncertainty is larger than the threshold, the model can reject to give a prediction and we can let a human expert to intervene.
  2. Please see page 12 of the paper: [https://arxiv.org/pdf/1711.09325.pdf]
  3. In test process, the data also need to run through both f-net and g-net.

Let me know if you have any other questions.

Lingkai-Kong avatar Nov 13 '20 21:11 Lingkai-Kong

Thanks for your timely response : )

  1. For example, the p=max(sofmax(logits)) and the threshold is v, if 'p<v', we let a human expert to intervene? Is that right?

  2. Thanks for share this reference. I copy the contents as follows for those who may have the same question. Not sure if this metric is debatable? It seems that all in-domain examples should achieve higher output values after sotmax layer than those of out-of-domain examples. However, this assumption may not true in practice.

Detection accuracy. This metric corresponds to the maximum classification probability over all possible thresholds δ: 1 − minδ {Pin (q (x) ≤ δ) P (x is from Pin) + Pout (q (x) > δ) P (x is from Pout) }, where q(x) is a confident score such as a maxi- mum value of softmax. We assume that both positive and negative examples have equal probability of appearing in the test set, i.e., P (x is from Pin) = P (x is from Pout) = 0.5.

  1. I just figured it out in the code. 👍

Thank you : )

GuokaiLiu avatar Nov 14 '20 03:11 GuokaiLiu