auto-sklearn icon indicating copy to clipboard operation
auto-sklearn copied to clipboard

Update `{'include': {'data_preprocessor': [...] }}`

Open eddiebergman opened this issue 4 years ago • 1 comments

It appears the the code still accepts data_preprocessor as a valid entry to include: Dict[str, Any] of the estimators. It states the only valid entry is 'feature_type' if passing data_preprocessor: []

AutoSklearnClassifier({
    time_left_for_this_task=30,
    include={
        'data_preprocessor': ['feature_type'] 
    }
}

While this works fine, we also document that it can't be turned off here and provide a broken example of how to turn it off here. It's broken as see by the sprint statistics that show only a single DummyModel returned.

This is also confusing as InputValidator also seems to also do OrdinalEncoding which is a possible choice of the DataPreprocessor step in the pipieline.

In the short term I can think of two possibilities:

  • Allow a boolean switch, include: {'data_preproccessing' : True/False}, removing the pipeline step entirely
  • Include a string like include: {'data_preproccessing' : 'no_preprocessing'}, having the pipeline step perform a no-op on the data.

eddiebergman avatar Oct 08 '21 12:10 eddiebergman

The example crashing is addressed with #1269

eddiebergman avatar Oct 16 '21 15:10 eddiebergman