autofeat icon indicating copy to clipboard operation
autofeat copied to clipboard

Reproducibility issue

Open janezlapajne opened this issue 2 years ago • 2 comments

Hello,

I noticed that results are not reproducible by using the library i.e. when using sklearn drop-down-replacement classes, they will each time produce slightly different results.

For example, when using:

features_engineer = AutoFeatClassifier()
features_engineer.fit_transform(data_train.data, data_train.target.value)

, it will calculate (or select) different features each time.

The issue above I temporarily fixed by using:

 random.seed(seed)
 np.random.seed(seed)

, so that the outputs produced by AutoFeatClassifier stay constant among runs.

However, when I tried using the following:

selector = FeatureSelector(verbose=self.verbose, problem_type="classification", featsel_runs=5)
selector.fit_transform(df_indices, target)

, the above-mentioned seed setting trick didn't translate into desirable outcome - the selected features still change during runs...

Is there an easy fix to correct this? Somewhere in the source randomness must be introduced somewhere, damn.

janezlapajne avatar Oct 11 '23 09:10 janezlapajne

Also, just now I noticed, that if the number of cores used (acr. n_jobs) is >1, then the results are not reproducible for the first scenario as well. So, the results are reproducible if n_jobs==1 and stochastic if n_jobs==-1.

Cheers.

janezlapajne avatar Oct 14 '23 18:10 janezlapajne

Hey @janezlapajne, thanks for pointing this out. A great observation!

I found several reproducibility issues in the code that I managed to fix. However, there is still some remaining randomness that is unresolved.

For reference the PR in here: (in case you want to review or further contribute) https://github.com/cod3licious/autofeat/pull/45

Cheers, J.

jtimko16 avatar Jul 25 '24 21:07 jtimko16