Retrieve image dimensions in PDF user space units
This feature request is an extension of https://github.com/pdfcpu/pdfcpu/issues/324.
Hi, thanks for maintaining such a great library and tool!
I am using the api.Images function to list all images in PDF file. However, I would also like to retrieve the width and height of each image as placed on the PDF page. Currently, api.Images returns a map of model.Image that only contains the intrinsic size of the image.
Example PDF file: capybara-clones.pdf
This is a minimal example using api.Images:
package main
import (
"fmt"
"log/slog"
"os"
"github.com/pdfcpu/pdfcpu/pkg/api"
)
func main() {
f, err := os.Open("capybara-clones.pdf")
if err != nil {
slog.Error("capybara-clones.pdf could not be opened", "error", err)
return
}
defer f.Close()
images, err := api.Images(f, nil, nil)
if err != nil {
slog.Error("capybara-clones.pdf could not be parsed", "error", err)
return
}
fmt.Printf("%+v\n", images)
// Prints [map[5:{Reader:<nil> Name:X5 FileType: PageNr:1 ObjNr:5 Width:1920 Height:1440 Bpc:8 Cs:ICCBased Comp:3 IsImgMask:false HasImgMask:false HasSMask:false Thumb:false Interpol:false Size:1102034 Filter:DCTDecode DecodeParms:}]]
}
The output above only provides the intrinsic pixel dimensions (1920×1440), but not how large the image appears on the page.
For comparison, the pdfimages tool (from poppler) reports multiple placements of the same image, along with their x-ppi and y-ppi (pixels per inch):
$ pdfimages -list capybara-clones.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1920 1440 icc 3 8 jpeg no 5 0 471 470 1076K 13%
1 1 image 1920 1440 icc 3 8 jpeg no 5 0 471 839 1076K 13%
1 2 image 1920 1440 icc 3 8 jpeg no 5 0 1119 448 1076K 13%
Please consider extending api.Images, or introducing a new API, to expose the dimensions of each image as drawn on the page. This would allow for better analysis and auditing of PDF content where image scaling or resolution matters.
Thank you for considering this feature!
Calculating the final image dimensions in user space is a deep dive into the rendering step of page content, remember an image can be part of a form which may be part of another form and so on, each with an individual transform matrix attached.
pdfcpu is a processor and not in the business of page rendering like other tools, therefore this is out of scope.