Deep-Forest
Deep-Forest copied to clipboard
[BUG] cannot correctly clone `CascadeForestRegressor` with `sklearn.base.clone` when using customized estimators
Describe the bug
cannot correctly clone CascadeForestClassifier
/CascadeForestRegressor
object with sklearn.base.clone
when using customized stimators
To Reproduce
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.base import clone
from deepforest import CascadeForestRegressor
import xgboost as xgb
import lightgbm as lgb
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
model = CascadeForestRegressor(random_state=1)
# set estimator
n_estimators = 4 # the number of base estimators per cascade layer
estimators = [lgb.LGBMRegressor(random_state=i) for i in range(n_estimators)]
model.set_estimator(estimators)
# set predictor
predictor = xgb.XGBRegressor()
model.set_predictor(predictor)
# clone model
model_new = clone(model)
# try to fit
model.fit(X_train, y_train)
Expected behavior No error
Additional context
~/miniconda3/envs/pycaret/lib/python3.8/site-packages/deep_forest-0.1.5-py3.8-linux-x86_64.egg/deepforest/cascade.py in fit(self, X, y, sample_weight)
1004 if not hasattr(self, "predictor_"):
1005 msg = "Missing predictor after calling `set_predictor`"
-> 1006 raise RuntimeError(msg)
1007
1008 binner_ = Binner(
RuntimeError: Missing predictor after calling `set_predictor`
This bug occours because when the model is cloned, if the model has customized predictor or estimators, predictor='custom'
will be cloned, while self.predictor_
/ self.dummy_estimators
will not be correctly cloned, which introduced the bug described above.
I think this bug can be easily fixed by putting the predictor and the list of estimators into the parameter of CascadeForestClassifier
/CascadeForestRegressor
, just like the way of those meta estimators (e.g. ngboost
), but maybe the corresponding APIs will have to be changed.
For example, the API parameters could be:
model = CascadeForestRegressor(
estimators=[lgb.LGBMRegressor(random_state=i) for i in range(n_estimators)],
predictor=xgb.XGBRegressor(),
)
Thanks for reporting, will take a look during the weekend.