timeout option

Open DanielRuf opened this issue 5 years ago • 2 comments

Hi,

pdfx is very helpful for us to analyze a few things. Thanks for creating pdfx.

But we have a small problem. When a pdf file contains much text pdfx / python only fails after the "too many recursions" error is thrown.

It would be helpful to have a max-timeout option to prevent that pdfx tries to parse files for 45 minutes and more (in our case).

And another small question: how could we scan / check many files at once in the best way? So far we run single pdfx commands from a bash script and wait until every command has finished. Using the & trick would cause some issues with the job scheduler of the OS and that the whole OS freezes.

Aug 19 '20 18:08 DanielRuf

Could you post the full stack trace, and perhaps an example PDF? Please reopen the issue with those, thanks 🙏

Apr 12 '21 19:04 metachris

Please reopen the issue with those, thanks

Only you can reopen the issue ;-)

Here is an example file:

54013162437.pdf

This stacktrace is produced:

pdfx.log

Apr 12 '21 20:04 DanielRuf