PyMuPDF
PyMuPDF copied to clipboard
get_pixmap method stuck on one page and runs forever
Description of the bug
I have a script that takes a PDF document URL, iterates through all the pages, generates a pixmap for each page, and uses it to create an image. However, the get_pixmap method gets stuck indefinitely on a particular page, and I'm unable to resolve it even after attempting to add timeouts.
How to reproduce the bug
Code snippet:
DPI = 300
pixmap = page.get_pixmap(matrix=fitz.Matrix(DPI/72, DPI/72))
The issue occurs on a specific page in the PDF file, and it's worth noting that other pages also contain numerous graphic elements. Here is the page on which this method stucks (please, note that other pages have also a lot of graphic elements). I would appreciate any insights into what might be causing this problem or any guidance on how to handle it, perhaps with the use of timeouts or alternative approaches.
PyMuPDF version
1.23.21
Operating system
MacOS
Python version
3.10
I think this is probably a duplicate of #3072.
Hi @julian-smith-artifex-com ! I have tried to use new PyMuPDF (1.23.22) version but still have the same problem. The program just stucks at this page whn trying to execure get_pixmap().
Ah, interesting. #3072 is definitely fixed in 1.23.22 so this looks like a new (but probably related) problem.
Can you supply your input file and python code that shows the problem?
same problem here. 1.23.22 did not fix the issue for me either. Also not that its critical info regarding the bug itself, but the CPU gets pinned at 100 percent and requires either a reboot or a ps kill to stop it. So its kind of a serious issue and a deal breaker for any code using pymupdf.
Just to be clear - we will need a reproducer for this problem if we are to investigate and fix it. If anyone has an example file that they can post, please do so here.
Unfortunately I am not able to share files that created this issue for me, but I can tell you that reverting to 1.21.0 has removed the problem. I will try to get a redacted file to you to test the issue with the current version.
Unfortunately I am not able to share files that created this issue for me, but I can tell you that reverting to 1.21.0 has removed the problem. I will try to get a redacted file to you to test the issue with the current version.
Thank you, i'm looking forward to receiving your redacted file.
@julian-smith-artifex-com how do you recommend creating a redacted file for testing this issue?
@julian-smith-artifex-com how do you recommend creating a redacted file for testing this issue?
You could use Document.select(page_number)
then Document.ez_save()
, to create a document that has only one page that shows the problem.
Alternatively, could you email the document directly to me at [email protected]
? Then i'll be able to take a look without it becoming public.
An update on this: a build of PyMuPDF with the latest MuPDF in git, does not hang.
So the problem will be fixed in PyMuPDF very soon after the next MuPDF release, which should be in the next week or two.
Fixed in 1.24.0.