mlr icon indicating copy to clipboard operation
mlr copied to clipboard

Fix: tuneThreshold - Minimization for measures needing maximization

Open jokokojote opened this issue 1 year ago • 1 comments

When using tuneThreshold to get the best threshold for a measure that needs maximization it does not work, but provides the minima (worst threshold) and returns it with the wrong sign.

Reproduce:

model = train(makeLearner("classif.rpart", predict.type = "prob"), sonar.task)
preds = predict(model, sonar.task)

performance(preds, bac)
# bac
# 0.8763815

tuneThreshold(preds, bac) # minimum instead of maximum returned (with wrong sign)
# $th
# [1] 0.9309793
#
# $perf
# bac
# -0.545045

d = generateThreshVsPerfData(preds, bac)$data
min(d$bac[2:99]) # min bac
# 0.545045
max(d$bac[2:99]) # max bac
# 0.8763815

Cause

The defined callback function in tune threshold always makes it a minimization problem for all measures (e.g. for measures that need maximization as well): ifelse(measure$minimize, 1, -1) * performance(setThreshold(pred, x), measure, task, model, simpleaggr = TRUE)

When optimizeSubInts is called the maximum flag is set depending on the measure's minimize flag (even though it was already handled and it will be always a minimization problem at this point). This leads to finding the overall minima not maxima of a measure that needs to be maximized (e.g. bac, acc etc.).

Fix

After changing the call of optimizeSubInts to search for the minima always it works correctly:

> model = train(makeLearner("classif.rpart", predict.type = "prob"), sonar.task)
> preds = predict(model, sonar.task)
> performance(preds, bac)
      bac 
0.8763815 
> tuneThreshold(preds, bac) 
$th
[1] 0.5309993

$perf
      bac 
0.8763815 

> d = generateThreshVsPerfData(preds, bac)$data
> min(d$bac[2:99]) 
[1] 0.545045
> max(d$bac[2:99]) 
[1] 0.8763815

jokokojote avatar Mar 28 '23 17:03 jokokojote

@pat-s @larskotthoff @mllg @berndbischl please review, thank you.

jokokojote avatar Mar 28 '23 17:03 jokokojote