mljar-supervised icon indicating copy to clipboard operation
mljar-supervised copied to clipboard

Multi-output regression support

Open pplonski opened this issue 3 years ago • 7 comments

pplonski avatar Mar 25 '21 08:03 pplonski

Support / integration for multioutput regression would be great! In a project, i am currently wrapping the AutoML instance with sklearn.multiputput models to achieve multioutput fitting. This works nearly. There are only 2 problems:

  1. Since models get trained consecutively, the results_path wont be empty after the first model is fit and subsequent training gets aborted.
  2. While multioutput Regression works (with results_path not set), the multioutput classification fails, since sklearn tries to access AutoML._classes when it does not exist. Dont know if that is solvable.

PeterLuenenschloss avatar Apr 01 '22 10:04 PeterLuenenschloss

@PeterLuenenschloss there should be added additional argument in AutoML constructor multi_output=True that will tell the AutoML object that it is going to train in multi-output environment. The final results can be saved as nested directories. The example:

automl = AutoML(result_path="AutoML_multi", multi_output=True)
clf = MultiOutputClassifier(automl).fit(X,Y)

There will be paths:

  • AutoML_multi/AutoML_1
  • AutoML_multi/AutoML_2
  • AutoML_multi/AutoML_3
  • and so on, till the number of targets

How the predictions are working in MultiOutputClassifier? Does it keep all objects in RAM?

pplonski avatar Apr 01 '22 11:04 pplonski

There will be paths:

AutoML_multi/AutoML_1 AutoML_multi/AutoML_2 AutoML_multi/AutoML_3 and so on, till the number of targets

Yes, thats how i also thought it should be!

there should be added additional argument in AutoML constructor multi_output=True

Maybe it is worth thinking about not only supporting simple MultiOutput, but also ChainRegression (or even defaulting to that) by wrapping with sklearnChainRgressor. In that case, there would also need to be an additional keyword, order, that allows for altering the default chain order, and also the results folder AutoML would need to somehow contain the model order mapping, for association of the trained AutoML models with the target indices.

How the predictions are working in MultiOutputClassifier? Does it keep all objects in RAM?

Yes the wrapper trains a model for every target dimension and combines the resulting fitted model objects to a model that predicts the array of those single value predictions, (just by ordering the results accordingly). The model instances are managed in the ram i guess. I Cant see no explicit to-disc-writing. The problem with the Classifier wrapper, is, that it tries to collect the prediction classes from the fitted single value models, after the fit is done, by accessing each models ._classes methods, wich are not implemented by fitted AutoML models. (But for example, are implemented by other sklearn-style model objects, like Xgboost). This step is done in the MultiOutputClassifier, just in order to assign the list of those collected classes to the _classes attribute of the constructed MultiOutput model object at the end.

PeterLuenenschloss avatar Apr 02 '22 13:04 PeterLuenenschloss

I can not find the 'multi_ouput' in the source code and document. Could you explain how can I use multi-output regression for my tabular data?

xinlnix avatar Aug 09 '22 14:08 xinlnix

@xinlnix it is not yet implemented.

pplonski avatar Aug 09 '22 17:08 pplonski

Built in implementation would be great but for others who need this in the meantime, the following seems to work and returns multioutput predictions.

automl = AutoML(mode="Explain") clf = MultiOutputRegressor(automl).fit(x_train, y_train) predictions = clf.predict(x_test)

RaymondWKWong avatar Aug 11 '22 15:08 RaymondWKWong

Built in implementation would be great but for others who need this in the meantime, the following seems to work and returns multioutput predictions.

automl = AutoML(mode="Explain") clf = MultiOutputRegressor(automl).fit(x_train, y_train) predictions = clf.predict(x_test)

This method fits the same model again for me,

X_train.shape, X_test.shape, y_train.shape, y_test.shape
((2492, 500), (623, 500), (2492, 3), (623, 3))
automl = AutoML(mode="Explain", results_path=model_path)
reg = MultiOutputRegressor(automl).fit(X_train, y_train)

This model has already been fitted. You can use predict methods or select a new 'results_path' for a new 'fit()'.
This model has already been fitted. You can use predict methods or select a new 'results_path' for a new 'fit()'.

Karlheinzniebuhr avatar Oct 01 '22 15:10 Karlheinzniebuhr