keras-molecules icon indicating copy to clipboard operation
keras-molecules copied to clipboard

Add random seed of training for reproducibility

Open hsiaoyi0504 opened this issue 8 years ago • 17 comments

As title

hsiaoyi0504 avatar Dec 21 '16 10:12 hsiaoyi0504

For issue #50

hsiaoyi0504 avatar Dec 21 '16 10:12 hsiaoyi0504

Could you also change the generator-based approach? Specifically, there is a random.shuffle call here that can be seeded: https://github.com/maxhodak/keras-molecules/blob/master/molecules/vectorizer.py. Additionally, perhaps seed should be a flag that can be passed into train.py and train_gen.py.

pechersky avatar Dec 21 '16 13:12 pechersky

Sure, I will work on it.

hsiaoyi0504 avatar Dec 21 '16 14:12 hsiaoyi0504

I don't think it's a good thing to do this (I mean add flag) , according to the comment here, by default Keras's model.compile() sets the shuffle argument as True. You should the set numpy seed before importing keras. Then, adding a flag would make code messy.

hsiaoyi0504 avatar Dec 21 '16 14:12 hsiaoyi0504

Oh, I got it. However, I think it's two different things. Random seed for numpy and random seed for random.

hsiaoyi0504 avatar Dec 21 '16 14:12 hsiaoyi0504

Already done (only one line code lol)

hsiaoyi0504 avatar Dec 21 '16 14:12 hsiaoyi0504

I meant something like that SmilesDataGenerator.init could take a seed kwarg, which would be passed in at train_gen.py, using some sort of flag. In that case, it's ok to pass it in as acquired from a flag. Regarding train.py, you could move the keras import statements into the main function, with argparsing of whether there is a seed flag or not.

Let's also pull the requirements commit out of this PR, and I'll accept it in the other PR.

pechersky avatar Dec 21 '16 15:12 pechersky

Is it ok?

hsiaoyi0504 avatar Dec 22 '16 05:12 hsiaoyi0504

I've committed a couple changes to train and train_gen to take the seed as a cli parameter. Could you test that they work as you expect?

pechersky avatar Dec 22 '16 13:12 pechersky

I test using tensorflow backend, and I found it doesn't work. After a quick search of this, it seems still open issue. Besides, I failed to execute using theano backend. I am still checking what happened.

hsiaoyi0504 avatar Dec 23 '16 14:12 hsiaoyi0504

Do you at least get consistent shuffling? Is it just the weight initialization that is not seeded?

On Fri, Dec 23, 2016 at 9:21 AM, hsiao yi [email protected] wrote:

I test using tensorflow backend, and I found it doesn't work. After a quick search of this, it seems still open issue https://github.com/fchollet/keras/issues/2280. Besides, I failed to execute using theano backend. I am still checking what happened.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maxhodak/keras-molecules/pull/51#issuecomment-268996757, or mute the thread https://github.com/notifications/unsubscribe-auth/AFGDhjBjitQKZOHcEO4NUhnAhsVSS7-pks5rK9jSgaJpZM4LSxtK .

pechersky avatar Dec 23 '16 14:12 pechersky

It looks like the train.py works fine using theano backend, so I think we can wait the keras update in the future. I will now start checking the train_gen.py part.

hsiaoyi0504 avatar Dec 24 '16 04:12 hsiaoyi0504

I am still unable to execute the train_gen.py. What is the input of this file?

hsiaoyi0504 avatar Dec 24 '16 06:12 hsiaoyi0504

Oh, I find that I used wrong input file. The train_gen.py doesn't need preprocessing.py.

hsiaoyi0504 avatar Dec 24 '16 06:12 hsiaoyi0504

#2743 is still an issue for me, despite all the suggestions. Restarting the notebook gives different results as well.

keras 2.0.2 numpy 1.11.2 tensorflow 1.0.0

pylang avatar Apr 01 '17 00:04 pylang

It turns out that it is due to fchollet/keras#2280. I gave up trying these things long time ago. It seems that it can't be fixed in an easy manner.

hsiaoyi0504 avatar Apr 01 '17 02:04 hsiaoyi0504

Thanks for the information. Non-reproducible results is a serious issue in keras imo.

pylang avatar Apr 01 '17 03:04 pylang