MolScribe icon indicating copy to clipboard operation
MolScribe copied to clipboard

fixes #22

Open eloyfelix opened this issue 10 months ago • 0 comments

Some images, for example US20230354702A1-20231102-C00260.TIF from USTPO grant red book (attached) makes MolScribe to hang for hours and use an unreasonable amount of RAM.

https://github.com/thomas0809/MolScribe/blob/97acee57d10bd719f4dc1cfd30d09f142b7dc65f/molscribe/chemistry.py#L200

shows:

[('L', 202)] 2020202020201 L 20202020202020201 L 2020202020201 L 20202020202020201

for this image. That means two trillions of iterations (attaching stuff to a list) in some cases that makes mass processing of images hang. Also using an unreasonable amount of memory.

The fix is extremelly simple skipping the processing of elements with more than 100000 atoms.

US20230354702A1-20231102-C00260.TIF.zip

eloyfelix avatar Apr 23 '24 13:04 eloyfelix