MolScribe icon indicating copy to clipboard operation
MolScribe copied to clipboard

USPTO images missing for the "small" train script

Open rytheranderson opened this issue 1 year ago • 2 comments

Hello, I've recently been attempting to retrain MolScribe using the scripts you provide in scripts. First of all, thanks for providing all your data and training scripts, extremely helpful.

Second, when running train_uspto_joint_chartok.sh I get a series of missing image warnings like:

[ WARN:[email protected]] global loadsave.cpp:248 findDecoder imread_('data/uspto_mol/2002/20020723/US06423704-20020723/US06423704-20020723-C00135.TIF'): can't open/read file: check file path/integrity

I downloaded the ZIP from the link provided in the README: https://www.dropbox.com/s/3podz99nuwagudy/uspto_mol.zip?dl=0, and unzipped it into data. This is not an issue when running train_uspto_joint_chartok_1m680k.sh. The problem is the uspto_mol/train_200k.csv has paths to images not provided in the ZIP archive.

It would be good to be able to run the smaller training set for quicker comparisons to your saved checkpoint. Let me know if this is fixable. Thanks for your time and this model!

rytheranderson avatar Jan 09 '24 22:01 rytheranderson