PyMuPDF There is an issue with the image generated by the page.get

There is an issue with the image generated by the page.get_pixmap() function

Open 1339503169 opened this issue 1 year ago • 5 comments

Description of the bug

img_test.pdf The image converted through the page.get_pixmap() function has characters that were not originally present in the PDF. The source file has characters that appear to be 'From (Shipper) 发件人', but the actual image displayed does not match the PDF. The converted image is like this, with the red box indicating the error. You can compare it with img_test. pdf for comparison

How to reproduce the bug

here is the code i used to generate image

''' import fitz document = fitz.open('./data/img_test.pdf') page = document.load_page(0) rotate = int(0) zoom_x, zoom_y = 2, 2 trans = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate) pix = page.get_pixmap(matrix=trans, alpha=False) pix.save('data/img_test.png') ''' what should I do to get the correct picture

PyMuPDF version

1.23.7 or earlier

Operating system

Windows

Python version

3.8

Jan 03 '24 10:01 1339503169

Submitted bug report in https://bugs.ghostscript.com/show_bug.cgi?id=707451.

Jan 04 '24 14:01 JorjMcKie

Just FYI, that file renders incorrectly in Evince on Fedora GNU/Linux (which is completely independent of PyMuPDF).

Jan 05 '24 02:01 cbm755

Just FYI, that file renders incorrectly in Evince on Fedora GNU/Linux (which is completely independent of PyMuPDF).

Thanks for this Colin. Yeah, maybe there is a general issue with these files. I am sure we will soon here from our friends at MuPDF.

Jan 07 '24 12:01 JorjMcKie

The file does indeed look broken. We have a fix in 1.24 that improves it.

The text now says "1 Front(Shipper)", albeit with dodgy spacing.

Essentially, it's a broken file, and we're doing as well with it as we can.

The commit in question is:

https://git.ghostscript.com/?p=mupdf.git;a=commitdiff;h=0a5b60420

I'll see about pulling this back to 1.23.x so you can get access to it soon.

Jan 30 '24 18:01 robinwatts

The MuPDF team has developed a fix that will at least improve the rendering of this type of pages.

Jan 31 '24 17:01 JorjMcKie

Fixed in 1.24.0.

Mar 21 '24 21:03 julian-smith-artifex-com

PyMuPDF PyMuPDF copied to clipboard

There is an issue with the image generated by the page.get_pixmap() function

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

PyMuPDF
PyMuPDF copied to clipboard