sl3
sl3 copied to clipboard
Unify internal structure for predictions
- Currently using a matrix to store all predictions.
- Should switch to data.table, with one column per model.
- The column can be of arbitrary type (i.e., a list)) or numeric, depending on the regression problem (binary class / regression or multinomial classif.)
I think this works for learners where the goal is predicting an outcome. However, it's not clear how this would work for "learners" where the goal is something else, for instance SL_Screener
currently generates a matrix of screened covariates when predict()
is called. This violates the "one column per model" paradigm. I think we need to allow learners to generate more than one column in the data table. We can still support stacking learners that generate arbitrary numbers of columns with cbind
.
This relates to currently enforced check: https://github.com/jeremyrcoyle/sl3/blob/master/tests/testthat/test_all_learners.R#L20
Can we switch that to
test_that("Learner can generate training set predictions",expect_equal(nrow(train_preds),nrow(task$X)))
for all "standard" learners? I.e., those that actually generate a) a single set of predictions of the new outcome (single column data.table) or b) several sets of predictions from different models, stored as data.table with several columns.
I still think that the best thing for screeners is to generate a predicted mutated design matrix, rather then the names of the selected covariates. In that case PCA can be used interchangeably with screeners, always generating the same types of the output.
Thoughts?
By making it nrow
instead of length
, you're enforcing that predict returns something that's matrix shaped. Is that what you want? This seems like it might break user expectations, as when we implement an s3
predict.Learner
method, it will return matricies/data.tables even from things like Learner_glm
, where a user might instead expect something vector shaped.
Alternatively, we could support both learners that return vectors and learners that return matricies by using sl3::safe_dim(x)[1]
Either way is fine with me. It seems like going for one prediction data type is always a safer and easier to maintain solution (I suggest that prediction always returns a data.table). We could also use your suggested approach for time being.
I would prefer to stay away from matrices as much as possible, unless we absolutely have to use them. When we do need to use, I think we should just perform the on the fly conversion as.matrix().
okay, it sounds like the easiest thing is to change it to sl3::safe_dim(x)[1]
for now, and then work on enforcing prediction types as a later effort