mlr3pipelines icon indicating copy to clipboard operation
mlr3pipelines copied to clipboard

PipeOpTargetTrafo drops missing factor levels

Open be-marc opened this issue 4 years ago • 2 comments

PipeOpTargetTrafo drops missing factor levels in task, whereas mlr3 keeps the factor levels.

library(mlr3)
library(mlr3pipelines)
options(mlr3.debug = TRUE)

task = tsk("boston_housing")
learner = lrn("regr.rpart")
ppl = ppl("targettrafo",
  graph = learner,
  targetmutate.trafo = function(x) log(x),
  targetmutate.inverter = function(x) list(response =  expm1(x$response)))
graph_learner = as_learner(ppl)

# fails
resample(task, graph_learner, rsmp("holdout"))

# > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object,  : 
# >  factor town has new levels Dover, Duxbury, Hamilton, Manchester, Marshfield, Medfield, Millis, Nahant, Weston
# > This happened PipeOp regr.rpart's $predict()

# works
resample(task, learner, rsmp("holdout"))

Using PipeOpFixFactors fixes the issue but maybe PipeOpTargetTrafo should not drop the factor levels?

The gallery post bike sharing fails (https://github.com/mlr-org/mlr3gallery/issues/119).

be-marc avatar Nov 20 '21 12:11 be-marc

PipeOpFixFactors is no solution for the gallery post. PipeOpFixFactors introduces missing values which are not supported by regr.kknn.

be-marc avatar Nov 20 '21 12:11 be-marc

I think this could either be fixed in mlr3pipelines "manually" or we directly fix mlr3::convert_task which seems to cause the problem (i.e., during the trafo a new Task is created using the DataBackend of the task but during resampling it can happen that some levels are no longer present and therefore are also no longer present in the backend resulting in also being missing in the new Task.

sumny avatar Nov 29 '21 17:11 sumny