MolScribe
MolScribe copied to clipboard
USPTO images missing for the "small" train script
Hello, I've recently been attempting to retrain MolScribe
using the scripts you provide in scripts
. First of all, thanks for providing all your data and training scripts, extremely helpful.
Second, when running train_uspto_joint_chartok.sh
I get a series of missing image warnings like:
[ WARN:[email protected]] global loadsave.cpp:248 findDecoder imread_('data/uspto_mol/2002/20020723/US06423704-20020723/US06423704-20020723-C00135.TIF'): can't open/read file: check file path/integrity
I downloaded the ZIP from the link provided in the README: https://www.dropbox.com/s/3podz99nuwagudy/uspto_mol.zip?dl=0, and unzipped it into data
. This is not an issue when running train_uspto_joint_chartok_1m680k.sh
. The problem is the uspto_mol/train_200k.csv
has paths to images not provided in the ZIP archive.
It would be good to be able to run the smaller training set for quicker comparisons to your saved checkpoint. Let me know if this is fixable. Thanks for your time and this model!