pdfminer.six Debugging is slowing down our processing

Debugging is slowing down our processing

Open professorr-x opened this issue 3 years ago • 2 comments

Hello Guys,

I recently integrated camelot to convert my pdf files to dataframes, with a fastapi upload process. Currently the processing time is taking 3mins per file after digging deeper i found out this is because of the pdfminer.psparser debugging, would it be possible to have this turned off.

Jan 28 '22 11:01 professorr-x

Do you mean the log.debug statements?

Can you be more precise about what is slowing it down? Did you use a profiler? A reproducible piece of code with an example PDF would help to investigate and fix this issue.

Jan 29 '22 14:01 pietermarsman

@professorr-x I also use Camelot and I am all for having faster running processes. Would be great if you posted your findings and we could look into options of seeing what could be done to speed this up.

Feb 11 '22 21:02 rain01

Hi guys,

As a workaround, you may disable logging by adding these lines of code after importing the PDFMiner libraries:

from pdfminer import pdfinterp, psparser, pdfdocument, pdfpage, cmapdb, pdfparser
import logging

pdfinterp.log.level = logging.ERROR
psparser.log.level = logging.ERROR
pdfdocument.log.level = logging.ERROR
pdfpage.log.level = logging.ERROR
cmapdb.log.level = logging.ERROR
pdfparser.log.level = logging.ERROR
logging.getLogger("pdfminer").setLevel(logging.ERROR)

I hope this helps.

Aug 18 '22 08:08 sylvain-josserand

I don't think the logging is actually slowing this package down.

Debug logs are not formatted if it can be avoided. If you disable the debug logs (by default) the only overhead is an empty function call.

If you experience otherwise, feel free to share a reproducible example that shows that debug logs slow pdfminer down.

Aug 18 '22 18:08 pietermarsman

pdfminer.six pdfminer.six copied to clipboard

Debugging is slowing down our processing

pdfminer.six
pdfminer.six copied to clipboard