PyMuPDF
PyMuPDF copied to clipboard
There is an issue with the image generated by the page.get_pixmap() function
Description of the bug
img_test.pdf The image converted through the page.get_pixmap() function has characters that were not originally present in the PDF. The source file has characters that appear to be 'From (Shipper) 发件人', but the actual image displayed does not match the PDF. The converted image is like this, with the red box indicating the error. You can compare it with img_test. pdf for comparison
How to reproduce the bug
here is the code i used to generate image
''' import fitz document = fitz.open('./data/img_test.pdf') page = document.load_page(0) rotate = int(0) zoom_x, zoom_y = 2, 2 trans = fitz.Matrix(zoom_x, zoom_y).prerotate(rotate) pix = page.get_pixmap(matrix=trans, alpha=False) pix.save('data/img_test.png') ''' what should I do to get the correct picture
PyMuPDF version
1.23.7 or earlier
Operating system
Windows
Python version
3.8
Submitted bug report in https://bugs.ghostscript.com/show_bug.cgi?id=707451.
Just FYI, that file renders incorrectly in Evince on Fedora GNU/Linux (which is completely independent of PyMuPDF).
Just FYI, that file renders incorrectly in Evince on Fedora GNU/Linux (which is completely independent of PyMuPDF).
Thanks for this Colin. Yeah, maybe there is a general issue with these files. I am sure we will soon here from our friends at MuPDF.
The file does indeed look broken. We have a fix in 1.24 that improves it.
The text now says "1 Front(Shipper)", albeit with dodgy spacing.
Essentially, it's a broken file, and we're doing as well with it as we can.
The commit in question is:
https://git.ghostscript.com/?p=mupdf.git;a=commitdiff;h=0a5b60420
I'll see about pulling this back to 1.23.x so you can get access to it soon.
The MuPDF team has developed a fix that will at least improve the rendering of this type of pages.
Fixed in 1.24.0.