A bit confused about the `Gram` implementation
https://github.com/Jingkang50/OpenOOD/blob/main/openood/postprocessors/gram_postprocessor.py#L115
I wonder why the dev is used as conf directly? Isn't the larger the deviations the more likely the sample would be an OOD?
I checked the original implementation here: https://github.com/VectorInstitute/gram-ood-detection/blob/master/ResNet_Cifar10.ipynb I found it actually using the negative of deviations when calculating the metrics.
But the metric looks ok, so I am quite confused. Am I missing sth?
It's indeed confusing. Like you said we are getting >50% AUROC for GRAM in all our experiments. Applying the negation seems the correct thing to do, but will lead to <50% AUROC. I will try to do some investigation when I'm available.
Hi, I worked on the gram matrix method and I seem to have fixed the implementation here
The results on CIFAR10 obtained with the current gram-matrix implementation are as follows:
FPR@95 AUROC AUPR_IN AUPR_OUT ACC
cifar100 91.68 ± 2.24 58.33 ± 4.49 56.74 ± 3.87 59.24 ± 4.62 95.06 ± 0.30
tin 90.06 ± 1.59 58.98 ± 5.19 61.65 ± 3.75 55.89 ± 5.56 95.06 ± 0.30
nearood 90.87 ± 1.91 58.66 ± 4.83 59.19 ± 3.79 57.57 ± 5.09 95.06 ± 0.30
mnist 70.30 ± 8.96 72.64 ± 2.34 36.92 ± 8.23 93.36 ± 1.21 95.06 ± 0.30
svhn 33.91 ± 17.35 91.52 ± 4.45 82.40 ± 8.85 96.62 ± 1.81 95.06 ± 0.30
texture 94.64 ± 2.71 62.34 ± 8.27 67.93 ± 5.60 55.93 ± 10.76 95.06 ± 0.30
places365 90.49 ± 1.93 60.44 ± 3.41 26.94 ± 2.62 85.64 ± 1.31 95.06 ± 0.30
farood 72.34 ± 6.73 71.74 ± 3.20 53.55 ± 4.74 82.89 ± 3.14 95.06 ± 0.30
With the corrected implementation, I was able to get:
FPR@95 AUROC AUPR_IN AUPR_OUT ACC
cifar100 61.61 ± 0.82 84.61 ± 0.20 84.21 ± 0.20 83.75 ± 0.32 95.06 ± 0.30
tin 51.99 ± 1.16 87.16 ± 0.52 88.46 ± 0.41 84.34 ± 0.83 95.06 ± 0.30
nearood 56.80 ± 0.62 85.88 ± 0.35 86.33 ± 0.28 84.04 ± 0.56 95.06 ± 0.30
mnist 7.31 ± 1.02 97.57 ± 0.49 94.37 ± 0.85 99.48 ± 0.14 95.06 ± 0.30
svhn 6.67 ± 0.29 98.64 ± 0.02 96.73 ± 0.06 99.48 ± 0.04 95.06 ± 0.30
texture 14.86 ± 0.71 96.95 ± 0.11 97.99 ± 0.12 95.53 ± 0.09 95.06 ± 0.30
places365 42.81 ± 2.19 89.56 ± 0.80 73.32 ± 1.45 96.53 ± 0.33 95.06 ± 0.30
farood 17.91 ± 0.70 95.68 ± 0.25 90.60 ± 0.35 97.75 ± 0.09 95.06 ± 0.30
When I used the same checkpoints with the code referred to by @SauceCat , I was able to get marginally higher results for SVHN but did not test other datasets. The new code is not fully polished but seems to be working as expected. I also did not run experiments on datasets other CIFAR10 as InD.
Thank you for the OpenOOD benchmark and considering the gram matrix method for inclusion in the benchmark!
@chandramouli-sastry Thanks for sharing the results, and glad to see the much improved numbers with the updated implementation. Would you mind opening a pull request for this? Meanwhile we will update the gram matrix results in both the paper and the leaderboard.
Thank you! I just created a pull request for your review.