auto-sklearn
auto-sklearn copied to clipboard
Update `{'include': {'data_preprocessor': [...] }}`
It appears the the code still accepts data_preprocessor as a valid entry to include: Dict[str, Any] of the estimators.
It states the only valid entry is 'feature_type' if passing data_preprocessor: []
AutoSklearnClassifier({
time_left_for_this_task=30,
include={
'data_preprocessor': ['feature_type']
}
}
While this works fine, we also document that it can't be turned off here and provide a broken example of how to turn it off here. It's broken as see by the sprint statistics that show only a single DummyModel returned.
This is also confusing as InputValidator also seems to also do OrdinalEncoding which is a possible choice of the DataPreprocessor step in the pipieline.
In the short term I can think of two possibilities:
- Allow a boolean switch,
include: {'data_preproccessing' : True/False}, removing the pipeline step entirely - Include a string like
include: {'data_preproccessing' : 'no_preprocessing'}, having the pipeline step perform a no-op on the data.
The example crashing is addressed with #1269