pdfminer.six
pdfminer.six copied to clipboard
Debugging is slowing down our processing
Hello Guys,
I recently integrated camelot to convert my pdf files to dataframes, with a fastapi upload process. Currently the processing time is taking 3mins per file after digging deeper i found out this is because of the pdfminer.psparser debugging, would it be possible to have this turned off.
Do you mean the log.debug
statements?
Can you be more precise about what is slowing it down? Did you use a profiler? A reproducible piece of code with an example PDF would help to investigate and fix this issue.
@professorr-x I also use Camelot and I am all for having faster running processes. Would be great if you posted your findings and we could look into options of seeing what could be done to speed this up.
Hi guys,
As a workaround, you may disable logging by adding these lines of code after importing the PDFMiner libraries:
from pdfminer import pdfinterp, psparser, pdfdocument, pdfpage, cmapdb, pdfparser
import logging
pdfinterp.log.level = logging.ERROR
psparser.log.level = logging.ERROR
pdfdocument.log.level = logging.ERROR
pdfpage.log.level = logging.ERROR
cmapdb.log.level = logging.ERROR
pdfparser.log.level = logging.ERROR
logging.getLogger("pdfminer").setLevel(logging.ERROR)
I hope this helps.
I don't think the logging is actually slowing this package down.
Debug logs are not formatted if it can be avoided. If you disable the debug logs (by default) the only overhead is an empty function call.
If you experience otherwise, feel free to share a reproducible example that shows that debug logs slow pdfminer down.