mlr3pipelines PipeOpTargetTrafo drops missing factor levels

PipeOpTargetTrafo drops missing factor levels in task, whereas mlr3 keeps the factor levels.

library(mlr3)
library(mlr3pipelines)
options(mlr3.debug = TRUE)

task = tsk("boston_housing")
learner = lrn("regr.rpart")
ppl = ppl("targettrafo",
  graph = learner,
  targetmutate.trafo = function(x) log(x),
  targetmutate.inverter = function(x) list(response =  expm1(x$response)))
graph_learner = as_learner(ppl)

# fails
resample(task, graph_learner, rsmp("holdout"))

# > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object,  : 
# >  factor town has new levels Dover, Duxbury, Hamilton, Manchester, Marshfield, Medfield, Millis, Nahant, Weston
# > This happened PipeOp regr.rpart's $predict()

# works
resample(task, learner, rsmp("holdout"))

Using PipeOpFixFactors fixes the issue but maybe PipeOpTargetTrafo should not drop the factor levels?

The gallery post bike sharing fails (https://github.com/mlr-org/mlr3gallery/issues/119).

Nov 20 '21 12:11 be-marc

PipeOpFixFactors is no solution for the gallery post. PipeOpFixFactors introduces missing values which are not supported by regr.kknn.

Nov 20 '21 12:11 be-marc

I think this could either be fixed in mlr3pipelines "manually" or we directly fix mlr3::convert_task which seems to cause the problem (i.e., during the trafo a new Task is created using the DataBackend of the task but during resampling it can happen that some levels are no longer present and therefore are also no longer present in the backend resulting in also being missing in the new Task.

Nov 29 '21 17:11 sumny