AutoMLWhitebox icon indicating copy to clipboard operation
AutoMLWhitebox copied to clipboard

Incorrect feature history

Open natss opened this issue 2 years ago • 0 comments

All feature history from Selector (Metrics, Correlation and L1) are rewriting by last model refit.

Example:

from sklearn.datasets import load_breast_cancer
from autowoe import AutoWoE

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

test_aw = AutoWoE(n_jobs=1, debug=True)
test_aw.fit(pd.concat([X, y], axis=1), 'target')

If we look at test_aw.feature_history we see a lot of 'Pruned during regression refit' reason and nothing is about selectors. But we know exactly that selector's reason exists. How did I check this:

history = {k: None if v == 'Pruned during regression refit' else v for k, v in test_aw.feature_history.items()}

selector = Selector(
    interpreted_model=test_aw.params["interpreted_model"],
    task=test_aw.params["task"],
    train=test_aw.train_df,
    target=test_aw.target,
    features_type=test_aw.private_features_type,
    n_jobs=test_aw.params["n_jobs"],
    cv_split=test_aw._cv_split,
    features_mark_values=None,
)

best_features, _sel_result = selector(
    history,
    pearson_th=test_aw.params["pearson_th"],
    metric_th=test_aw.params["metric_th"],
    vif_th=test_aw.params["vif_th"],
    l1_grid_size=test_aw.params["l1_grid_size"],
    l1_exp_scale=test_aw.params['l1_exp_scale'],
    metric_tol=test_aw.params["metric_tol"],
)

And if we look at history now we can see multiply different drop reason

My suggestion is to change third argument for last feature_changing() in AutoWoE.fit() from self._private_features_type to best_features, because now feature_changing() thinks that all input features for selectors is features_before for last refit.

natss avatar Jul 13 '22 15:07 natss