autofeat icon indicating copy to clipboard operation
autofeat copied to clipboard

#43 - Reproducibility

Open jtimko16 opened this issue 1 year ago • 7 comments

Fix several reproducibility issues.

However, I didn't manage to fix it in the whole code; there is still some uncaptured randomness that I haven't been able to solve so far.

jtimko16 avatar Jul 25 '24 21:07 jtimko16

Hello, I will go quickly through the changes and check if I everything seems ok - see code comments. I cannot fully test the thing right now, maybe in the following days. I will post new updates under issue: https://github.com/cod3licious/autofeat/issues/43.

janezlapajne avatar Jul 26 '24 07:07 janezlapajne

Thanks for looking into this! I think you definitely found some places where setting random seeds should help, but I think there are also some places where it is not necessary. To avoid overcomplicating the implementation, it would be great if in the end you could also check whether some of your additions could be removed again while still keeping it reproducible. I'd be happy to later comment on the places where I think it might be possible to keep the existing implementation while still getting reproducible results.

cod3licious avatar Jul 26 '24 11:07 cod3licious

Hello @janezlapajne and @cod3licious, thanks a lot for your comments! I will try to continue with it and incorporate the changes next week.

jtimko16 avatar Jul 26 '24 19:07 jtimko16

Thanks for looking into this! I think you definitely found some places where setting random seeds should help, but I think there are also some places where it is not necessary. To avoid overcomplicating the implementation, it would be great if in the end you could also check whether some of your additions could be removed again while still keeping it reproducible. I'd be happy to later comment on the places where I think it might be possible to keep the existing implementation while still getting reproducible results.

Hello, thanks for your suggestions. As I mentioned, unfortunately, the implementation is not fully reproducible yet. There is for sure improvement from the last time, but there is probably one more randomness that I didn't capture.

However, once I will be looking at it next week, I will do my best to remove extra random seeds and discover the remaining unsolved randomness.

jtimko16 avatar Jul 26 '24 19:07 jtimko16

@janezlapajne and @cod3licious, Thanks again for your great comments. I did my best to implement all of those.

As I mentioned before, we have made some improvements, but the feature selector is not fully reproducible yet. And unfortunately, I don't have time to dive into this deeper again.

Once you re-review the PR, we can agree on the next steps. We can either keep the PR open, or merge at least the current changes. Whatever you think is better.

jtimko16 avatar Aug 05 '24 14:08 jtimko16

Thank you for your work so far! However, I'm against merging this before it is truly ready. This also needs some proper tests (beyond a manually run notebook) to ensure that future changes don't cause a regression.

I agree, it will be better to fully finalize, rather than merge half done. As I said, I am quite busy those days, but I will try to find time for it once possible :)

jtimko16 avatar Aug 08 '24 14:08 jtimko16

Hello everyone,

I spent another time with debugging today. However, I didn't manage to identify the remaining issue - there is still some randomness, even when running FeatureSelector only for featsel_run=1.

Therefore, I am not sure whether I will be able to finish. In case anyone would like to take over, I can explain/summarize what I have done so far, or we can team up and look at it together.

All the best, J.

jtimko16 avatar Aug 17 '24 15:08 jtimko16