Yakov Pechersky

Results 60 comments of Yakov Pechersky

I am able to load the weights without issue, after freshly downloading both the smiles_500 and model_500 h5 files. Can you do me a favor, and run the following: ```...

As far as I can tell, the model_500k.h5 that is in the data is older than the current preprocess code. I'd suggest trying `sample_gen.py` directly from the smiles datafiles. I'd...

Would we be alright to switching to generator based training? That gets rid of the need to preprocess and the need to compress as well. On Mon, Nov 14, 2016...

@dakoner can you provide a link to the 50M GDB-17 dataset you're using?

The following branch should be able to train using a stream-based approach, requiring way less RAM. It also provides a solution for issue #39. Please test it out -- you'll...

You might have gotten the epoch warning if your batch_size doesn't cleanly divide epoch_size. Thanks for your comments here and on the commit. Could you share the command that you...

@dakoner There was a bug in encoding, it wasn't properly encoding padded words. I've also fixed the bugs you've pointed out. Now `train_gen` quickly reaches >60% accuracy within the first...

The sampling is with replacement, so any epoch size can be used. I chose to use "with replacement" to make the generator have as least state as possible. For some...

In the `molecules.vectorizer`, `SmilesDataGenerator` takes a `test_split` optional parameter that creates the "index point" you mentioned. By default, it is `0.20`, so 4/5 of the data is used for training,...

I should add that if you are training on 35K, and you assume the default `test_split=0.20`, then your "true" effective training set size is 28K. That's the epoch size you'll...