pyvips icon indicating copy to clipboard operation
pyvips copied to clipboard

`.pagesplit()` not working with iOS Quartz produced pdfs

Open TarunChakitha opened this issue 6 months ago • 3 comments

Hi @jcupitt,

I am trying to split a many-page image into a list of N separate images.

Code:

import pyvips

file_path = "/filesharemnt/testpdf.pdf"
DPI = float(150)
multi_page_image = pyvips.Image.pdfload(file_path, n = -1, dpi=DPI)

total_pages = multi_page_image.get_n_pages()
print("total_pages",total_pages)

fields = multi_page_image.get_fields()
for field in fields:
    print(f"{field}: {multi_page_image.get(field)}")

individual_pages = multi_page_image.pagesplit()
print("\nlen(individual_pages) =", len(individual_pages))

output:

total_pages 925
width: 1275
height: 1622346
bands: 4
format: uchar
coding: none
interpretation: srgb
xoffset: 0
yoffset: 0
xres: 5.905511811023622
yres: 5.905511811023622
filename: /filesharemnt/testpdf.pdf
vips-loader: pdfload
page-height: 1650
pdf-n_pages: 925
n-pages: 925
pdf-producer: iOS Version 15.5 (Build 19F77) Quartz PDFContext; modified using iText® 5.4.1 ©2000-2012 1T3XT BVBA (AGPL-version)

len(individual_pages) = 1

Expected:

  • individual_pages must contain a list of 925 individual pages

Actual:

  • individual_pages has only 1 element which same as the multi_page_image but with a temp filename.

I noticed that this is happening with pdfs having the producer given in the output. Rest of the pdfs I tested have a different producer and its working for them.

OS details: only tried testing this with debian 11 docker, ubuntu docker.

lsb_release -a:

Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye

uname -a:

Linux SandboxHost-638582921772039215 5.10.102.2-microsoft-standard #1 SMP Mon Mar 7 17:36:34 UTC 2022 x86_64 GNU/Linux

Python version 3.10.14 pyvips version: 2.2.3

could you please help.

TarunChakitha avatar Aug 03 '24 14:08 TarunChakitha