mlr3proba icon indicating copy to clipboard operation
mlr3proba copied to clipboard

Move non-proba specific pipelines to mlr3pipelines

Open RaphaelS1 opened this issue 5 years ago • 5 comments

@pfistfl below are a list of non-proba specific pipelines I could move to mlr3pipelines, if there are any you disagree with just comment and I'll delete them from the list - we can always rename them as you want, I've included a brief description for each.

  • [ ] PipeOpTransformer - Abstract pipeop for transformers, a transformer changes an object of the same type to a different class, e.g. PipeOpPredictionRegr to PipeOpPredictionSurv. Current implementation has a single private method called .transform.
  • [ ] PipeOpPredTransformer - Abstract pipeop for prediction transformers. Training input and outputs are NULL. Outputs are specialised in sub-classes but should be a single object of the prediction object to transform to. .train returns list(NULL) and .predict calls .transform.
  • [ ] PipeOpTaskTransformer - Abstract pipeop for task transformers. Inputs and outputs in training and predicting are identical. But in most cases in predict the transformer may do much less, e.g. in training the target and features may be transformed but in prediction possibly only the features are changed - specialised by sub-classes.
  • [ ] PipeOpCompositor - Abstract pipeop for prediction composition. Composition is the name given to creating a new (or overwritten) predict_type from others. The output will be whatever the specialised prediction class is. The input could be one or more prediction objects to create the new predict_type from. This class is not strictly required as sub-classes will have little in common and could be unified using @family but this cleans it up a little.

RaphaelS1 avatar Jul 23 '20 18:07 RaphaelS1

Hey,

thanks for writing things up, I'll go over things:

  • Just a general comment, is the further inheritance step i.e. PipeOpTransformer necessary?

  • PipeOpTaskTransformer : This is already implemented in PipeOpUpdateTarget (https://github.com/mlr-org/mlr3pipelines/blob/master/R/PipeOpTrafo.R). We talked about this last week, and @mllg will add parts of it (An extended convert_task function) to mlr3. We perhaps should also create a general Prediction converter to cover the PipeOpPredictionTransformer case. In general, I would wait for those things to be implemented in mlr3 and then we could try to bring together the two existing PipeOps.

  • PipeOpPredictionTransformer: This currently does not exist in mlr3pipelines, but we perhaps should have parts of the base infrastructure in mlr3 (see above). This should move to mlr3pipelines, as it will also be needed in mlr3ordinal.

  • [ ] PipeOpCompositor: I agree. Basically all PredictionAvg | Ensemble etc. PipeOp's would belong to this case? I see your point; Additionally, for some compositions we might e.g. need factor levels (i.e. one-vs-rest classification, ordinal) together with a set of predictions. Not sure what the best abstraction principle here is.

pfistfl avatar Jul 26 '20 13:07 pfistfl

Just a general comment, is the further inheritance step i.e. PipeOpTransformer necessary?

Strictly? No. Again can just collect these using @family but the current implementation is similar to PipeOpEnsemble in that it defines all child classes by specifying a single private .transform method which is then always called by .train and .predict so that child classes only need to edit .transform.

This is already implemented in PipeOpUpdateTarget (https://github.com/mlr-org/mlr3pipelines/blob/master/R/PipeOpTrafo.R). We talked about this last week, and @mllg will add parts of it (An extended convert_task function) to mlr3.

I think this pipeop is currently too specific and doesn't generalise to complex cases. e.g. both my PipeOpTaskRegrSurv and PipeOpTaskSurvRegr require two inputs, and some will change/add to features as well. In the case of converting from one class to another many possible strategies are required and cannot be over-simplified in the parent class.

PipeOpCompositor: I agree. Basically all PredictionAvg | Ensemble etc. PipeOp's would belong to this case? I see your point; Additionally, for some compositions we might e.g. need factor levels (i.e. one-vs-rest classification, ordinal) together with a set of predictions. Not sure what the best abstraction principle here is.

Exactly. Though in the simpler case like in the distr6 ones it's just a case of applying a simple function to a PredictionSurv and returning an identical object but with an additional column.

In general, I would wait for those things to be implemented in mlr3 and then we could try to bring together the two existing PipeOps.

Makes sense.

RaphaelS1 avatar Jul 27 '20 09:07 RaphaelS1

I created an issue in mlr3.

The current goal was to use a S3 convert_task function, that can be extended by the different packages.

Perhaps also see https://github.com/mlr-org/mlr3/issues/470 for my thoughts on this problem.

I think this pipeop is currently too specific and doesn't generalise to complex cases. Then we should perhaps try to extend it in a way that makes it flexible enough :) If this does proof to be hard / impossible though, we could also copy them over.

From a user perspective, I would try to keep the number of PipeOps as small as possible.

pfistfl avatar Jul 27 '20 09:07 pfistfl

See https://github.com/mlr-org/mlr3/pull/529/files;

In general, e.g. for future extensibility (consider e.g. a future TaskSurvSpatioTemp, we should try to have common task conversion functionality which ensures everything relevant is copied over. (We can create different PipeOps on top, but we should perhaps use common base functionality that ensures col_roles, row_roles, other args. etc. are correctly copied over.

pfistfl avatar Jul 27 '20 13:07 pfistfl

So, now that convert_task is part of base mlr3, should we try to homogenize / move pipeops to pipelines here? I guess comments from @mb706 and @sumny would be welcome?

pfistfl avatar Sep 14 '20 12:09 pfistfl