PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

get_pixmap method stuck on one page and runs forever

Open SofiiaChaban opened this issue 1 year ago • 7 comments

Description of the bug

I have a script that takes a PDF document URL, iterates through all the pages, generates a pixmap for each page, and uses it to create an image. However, the get_pixmap method gets stuck indefinitely on a particular page, and I'm unable to resolve it even after attempting to add timeouts.

How to reproduce the bug

Code snippet:

DPI = 300
pixmap = page.get_pixmap(matrix=fitz.Matrix(DPI/72, DPI/72))

The issue occurs on a specific page in the PDF file, and it's worth noting that other pages also contain numerous graphic elements. Here is the page on which this method stucks (please, note that other pages have also a lot of graphic elements). I would appreciate any insights into what might be causing this problem or any guidance on how to handle it, perhaps with the use of timeouts or alternative approaches.

image

PyMuPDF version

1.23.21

Operating system

MacOS

Python version

3.10

SofiiaChaban avatar Feb 02 '24 11:02 SofiiaChaban

I think this is probably a duplicate of #3072.

Hi @julian-smith-artifex-com ! I have tried to use new PyMuPDF (1.23.22) version but still have the same problem. The program just stucks at this page whn trying to execure get_pixmap().

SofiiaChaban avatar Feb 15 '24 09:02 SofiiaChaban

Ah, interesting. #3072 is definitely fixed in 1.23.22 so this looks like a new (but probably related) problem.

Can you supply your input file and python code that shows the problem?

same problem here. 1.23.22 did not fix the issue for me either. Also not that its critical info regarding the bug itself, but the CPU gets pinned at 100 percent and requires either a reboot or a ps kill to stop it. So its kind of a serious issue and a deal breaker for any code using pymupdf.

mikecummings34 avatar Feb 15 '24 23:02 mikecummings34

Just to be clear - we will need a reproducer for this problem if we are to investigate and fix it. If anyone has an example file that they can post, please do so here.

Unfortunately I am not able to share files that created this issue for me, but I can tell you that reverting to 1.21.0 has removed the problem. I will try to get a redacted file to you to test the issue with the current version.

kblevins avatar Feb 23 '24 17:02 kblevins

Unfortunately I am not able to share files that created this issue for me, but I can tell you that reverting to 1.21.0 has removed the problem. I will try to get a redacted file to you to test the issue with the current version.

Thank you, i'm looking forward to receiving your redacted file.

@julian-smith-artifex-com how do you recommend creating a redacted file for testing this issue?

kblevins avatar Mar 07 '24 18:03 kblevins

@julian-smith-artifex-com how do you recommend creating a redacted file for testing this issue?

You could use Document.select(page_number) then Document.ez_save(), to create a document that has only one page that shows the problem.

Alternatively, could you email the document directly to me at [email protected]? Then i'll be able to take a look without it becoming public.

An update on this: a build of PyMuPDF with the latest MuPDF in git, does not hang.

So the problem will be fixed in PyMuPDF very soon after the next MuPDF release, which should be in the next week or two.

Fixed in 1.24.0.