MolScribe
MolScribe copied to clipboard
fixes #22
Some images, for example US20230354702A1-20231102-C00260.TIF
from USTPO grant red book (attached) makes MolScribe to hang for hours and use an unreasonable amount of RAM.
https://github.com/thomas0809/MolScribe/blob/97acee57d10bd719f4dc1cfd30d09f142b7dc65f/molscribe/chemistry.py#L200
shows:
[('L', 202)] 2020202020201 L 20202020202020201 L 2020202020201 L 20202020202020201
for this image. That means two trillions of iterations (attaching stuff to a list) in some cases that makes mass processing of images hang. Also using an unreasonable amount of memory.
The fix is extremelly simple skipping the processing of elements with more than 100000 atoms.