AutoMLWhitebox
AutoMLWhitebox copied to clipboard
Incorrect feature history
All feature history from Selector (Metrics, Correlation and L1) are rewriting by last model refit.
Example:
from sklearn.datasets import load_breast_cancer
from autowoe import AutoWoE
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
test_aw = AutoWoE(n_jobs=1, debug=True)
test_aw.fit(pd.concat([X, y], axis=1), 'target')
If we look at test_aw.feature_history
we see a lot of 'Pruned during regression refit' reason and nothing is about selectors. But we know exactly that selector's reason exists. How did I check this:
history = {k: None if v == 'Pruned during regression refit' else v for k, v in test_aw.feature_history.items()}
selector = Selector(
interpreted_model=test_aw.params["interpreted_model"],
task=test_aw.params["task"],
train=test_aw.train_df,
target=test_aw.target,
features_type=test_aw.private_features_type,
n_jobs=test_aw.params["n_jobs"],
cv_split=test_aw._cv_split,
features_mark_values=None,
)
best_features, _sel_result = selector(
history,
pearson_th=test_aw.params["pearson_th"],
metric_th=test_aw.params["metric_th"],
vif_th=test_aw.params["vif_th"],
l1_grid_size=test_aw.params["l1_grid_size"],
l1_exp_scale=test_aw.params['l1_exp_scale'],
metric_tol=test_aw.params["metric_tol"],
)
And if we look at history
now we can see multiply different drop reason
My suggestion is to change third argument for last feature_changing()
in AutoWoE.fit()
from self._private_features_type
to best_features
, because now feature_changing()
thinks that all input features for selectors is features_before for last refit.