autofeat icon indicating copy to clipboard operation
autofeat copied to clipboard

Reproducibility issue

Open janezlapajne opened this issue 8 months ago • 1 comments

Hello,

I noticed that results are not reproducible by using the library i.e. when using sklearn drop-down-replacement classes, they will each time produce slightly different results.

For example, when using:

features_engineer = AutoFeatClassifier()
features_engineer.fit_transform(data_train.data, data_train.target.value)

, it will calculate (or select) different features each time.

The issue above I temporarily fixed by using:

 random.seed(seed)
 np.random.seed(seed)

, so that the outputs produced by AutoFeatClassifier stay constant among runs.

However, when I tried using the following:

selector = FeatureSelector(verbose=self.verbose, problem_type="classification", featsel_runs=5)
selector.fit_transform(df_indices, target)

, the above-mentioned seed setting trick didn't translate into desirable outcome - the selected features still change during runs...

Is there an easy fix to correct this? Somewhere in the source randomness must be introduced somewhere, damn.

janezlapajne avatar Oct 11 '23 09:10 janezlapajne