mlr3pipelines icon indicating copy to clipboard operation
mlr3pipelines copied to clipboard

Missing PipeOpRowApply

Open mb706 opened this issue 4 years ago • 2 comments

like PipeOpColApply but for rows (of a given type).

mb706 avatar Feb 12 '20 09:02 mb706

What exactly should this PipeOp do?

I want something that does row-wise mean, or row-wise normalization, by simply specifying a function that does this. It should roughly translate to doing t(apply(data, 1, <applicator>)). E.g.

# row-wise centering through `scale(x, scale = FALSE)`
po = po("rowapply", applicator = function(x) c(scale(x, scale = FALSE)))
task = po$train(list(tsk("iris")))[[1]]

this should have the same effect as

task = tsk("iris")
data = task$data(cols = task$feature_names)
data = t(apply(data, 1, function(x) c(scale(x, scale = FALSE))))
task$cbind(data)

To Do:

  • Start from PipeOpColApply
  • add select_cols = selector_type(c("numeric", "integer")). I don't think we can operate on anything else.
  • The last line of transform_dt should then probably be t(apply(task, 1, applicator)))
  • I don't know if we can remove the train_dt and predict_dt that are in PipeOpColApply, or if they are needed in PipeOpRowApply. See the following tests and check if we can go without them.
  • Write lots of tests to check if this works. In particular, what happens with
    • empty task (no columns)
    • training task has only integer features, only numeric features, has both integer and numeric features
    • the above applicator is one of as.integer, as.numeric
    • the above, but predict task has 0 rows

mb706 avatar Feb 12 '20 10:02 mb706

  • Should have an additional untyped parameter (custom_check = check_string) "name_prefix", initialized to "". This is a name that should be prefixed to the generated columns, if given. (I.e. behaviour when it is "" --> nothing changes, when it is "X", then columns should be prefixed by "X.")

mb706 avatar Feb 12 '20 11:02 mb706