autofeat
autofeat copied to clipboard
Reproducibility issue
Hello,
I noticed that results are not reproducible by using the library i.e. when using sklearn drop-down-replacement classes, they will each time produce slightly different results.
For example, when using:
features_engineer = AutoFeatClassifier()
features_engineer.fit_transform(data_train.data, data_train.target.value)
, it will calculate (or select) different features each time.
The issue above I temporarily fixed by using:
random.seed(seed)
np.random.seed(seed)
, so that the outputs produced by AutoFeatClassifier stay constant among runs.
However, when I tried using the following:
selector = FeatureSelector(verbose=self.verbose, problem_type="classification", featsel_runs=5)
selector.fit_transform(df_indices, target)
, the above-mentioned seed setting trick didn't translate into desirable outcome - the selected features still change during runs...
Is there an easy fix to correct this? Somewhere in the source randomness must be introduced somewhere, damn.
Also, just now I noticed, that if the number of cores used (acr. n_jobs) is >1, then the results are not reproducible for the first scenario as well. So, the results are reproducible if n_jobs==1 and stochastic if n_jobs==-1.
Cheers.
Hey @janezlapajne, thanks for pointing this out. A great observation!
I found several reproducibility issues in the code that I managed to fix. However, there is still some remaining randomness that is unresolved.
For reference the PR in here: (in case you want to review or further contribute) https://github.com/cod3licious/autofeat/pull/45
Cheers, J.