mlr
mlr copied to clipboard
Fix: tuneThreshold - Minimization for measures needing maximization
When using tuneThreshold
to get the best threshold for a measure that needs maximization it does not work, but provides the minima (worst threshold) and returns it with the wrong sign.
Reproduce:
model = train(makeLearner("classif.rpart", predict.type = "prob"), sonar.task)
preds = predict(model, sonar.task)
performance(preds, bac)
# bac
# 0.8763815
tuneThreshold(preds, bac) # minimum instead of maximum returned (with wrong sign)
# $th
# [1] 0.9309793
#
# $perf
# bac
# -0.545045
d = generateThreshVsPerfData(preds, bac)$data
min(d$bac[2:99]) # min bac
# 0.545045
max(d$bac[2:99]) # max bac
# 0.8763815
Cause
The defined callback function in tune threshold always makes it a minimization problem for all measures (e.g. for measures that need maximization as well):
ifelse(measure$minimize, 1, -1) * performance(setThreshold(pred, x), measure, task, model, simpleaggr = TRUE)
When optimizeSubInts
is called the maximum
flag is set depending on the measure's minimize
flag (even though it was already handled and it will be always a minimization problem at this point). This leads to finding the overall minima not maxima of a measure that needs to be maximized (e.g. bac, acc etc.).
Fix
After changing the call of optimizeSubInts to search for the minima always it works correctly:
> model = train(makeLearner("classif.rpart", predict.type = "prob"), sonar.task)
> preds = predict(model, sonar.task)
> performance(preds, bac)
bac
0.8763815
> tuneThreshold(preds, bac)
$th
[1] 0.5309993
$perf
bac
0.8763815
> d = generateThreshVsPerfData(preds, bac)$data
> min(d$bac[2:99])
[1] 0.545045
> max(d$bac[2:99])
[1] 0.8763815
@pat-s @larskotthoff @mllg @berndbischl please review, thank you.