mlr3
mlr3 copied to clipboard
Relax autotest importance and selected features naming requirement
There are a few learners which, for some reason or other, (usually conversion to model matrix), change the original variable names. This means that variable importance and selected features could be offered by the learner but can't be because autottest requires these to return the same variable names as the original task. It's a shame to have to get rid of these useful methods because of this restriction.
A few alternative suggestions:
- Allow
importanceandselected_featuresto be added to theexcludeargument inrun_autotest - In the autotest, first check if these names match original task names and then check if they match a model.matrix name
- Remove this check all together
Reprex below for an important example (i.e. where selection is often of interest)
library(mlr3); library(mlr3learners.mboost)
learn = lrn("regr.glmboost")
task = tsk("boston_housing")
learn$train(task)
variable.names(learn$model, usedonly = TRUE)
#> (Intercept) cmedv townBoston Savin Hill
#> "(Intercept)" "cmedv" "townBoston Savin Hill"
#> townLexington townNatick townSalem
#> "townLexington" "townNatick" "townSalem"
#> townWinthrop
#> "townWinthrop"
Created on 2020-04-29 by the reprex package (v0.3.0)
It's a shame to have to get rid of these useful methods because of this restriction.
I'd say its unfortunate, not a shame.
I'd say 2) sounds good. In your example, the differences are only the quotes and each variable appears twice therefore? This could be easily accounted for internally. Are there any other, more complicated, cases of name alteration?
Sorry I forgot to reply before. The example above omits the original learner names, see the reprex below. Contrast the names returned by the model (first output) to the originals (second)
library(mlr3); library(mlr3learners.mboost)
learn = lrn("regr.glmboost")
task = tsk("boston_housing")
learn$train(task)
variable.names(learn$model, usedonly = TRUE)
#> (Intercept) cmedv townBoston Savin Hill
#> "(Intercept)" "cmedv" "townBoston Savin Hill"
#> townLexington townNatick townSalem
#> "townLexington" "townNatick" "townSalem"
#> townWinthrop
#> "townWinthrop"
task$feature_names
#> [1] "age" "b" "chas" "cmedv" "crim" "dis" "indus"
#> [8] "lat" "lon" "lstat" "nox" "ptratio" "rad" "rm"
#> [15] "tax" "town" "tract" "zn"
Created on 2020-05-07 by the reprex package (v0.3.0)
Related to #401.
selectected_features can easily be fixed. For importance, it is more difficult because it is unclear how to aggregate multiple scores into a single one.
Do you have an overview which learners need to be patched?