BinaryEvaluation often results in NaN for best F1-Scores
I then printed out all F1-Scores of the ConfusionMatrices-List, which looks as expected, but has some NaN at the end. Something like
Threshold: F1-Score 0.1: 0.1 0.2: 0.2 0.4: 0.3 0.5: 0.4 0.6: 0.3 0.8: 0.2 0.9: 0.1 0.99: NaN 0.999: NaN
It should pick 0.5 as best threshold.
The problem seems that NaN are also compared, and it seems like Double.compare(SomeNumber, NaN) = -1 and Double.compare(NaN, SomeNumber) = 1. Perhaps something like this would be better:
BinaryConfusionMatrix highestF1CM = eval.getConfusionMatrices().stream().filter(x -> Double.isFinite(x.getF1Score())).max(Comparator.comparingDouble(BinaryConfusionMatrix::getF1Score)).get();
Apologies for the late response, and thank you for raising this issue. I think the underlying problem here is that, at high thresholds, you're in the very unusual position where both your recall and precision are 0 (e.g. you have one highest-scoring example whose true label is false) and thus their F1-score/harmonic mean is calculated as 0/0 == NaN. We should be able to resolve this by treating 0/0 == 0 in this context. Please expect to see a diff to this effect by tomorrow.