PyMuPDF-Optional-Material icon indicating copy to clipboard operation
PyMuPDF-Optional-Material copied to clipboard

get_pixmap may cause large image size

Open BrentHuang opened this issue 1 year ago • 2 comments

doc = fitz.open(pdf_file) for page in doc: pix = page.get_pixmap() img_file = f'{img_file_prefix}-{page.number}.jpg' pix.save(img_file)

Will get_pixmap cause the generated JPG image to be too large in the above code? Is there a better way to convert every page in the PDF into a JPG image?

BrentHuang avatar Mar 18 '24 12:03 BrentHuang

The pixmap size is directly linked to the page size. Roughly width * height * 3 (for RGB images). You can reduce this in a number of ways, e.g. using grayscale instead of RGB (reduce by factor 3), or by downscaling using a DPI value < 96.

JorjMcKie avatar Mar 18 '24 13:03 JorjMcKie

When you save the JPEG you can also influence the quality - please see documentation. A lower quality value also reduces the image size. Also try PNG instead - it may compress better than JPG.

JorjMcKie avatar Mar 18 '24 13:03 JorjMcKie