PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

Ability to mirror PDF pages

Open rrthomas opened this issue 1 month ago • 9 comments

Describe the solution you'd like

When calling Page.show_pdf_page it's currently possible to perform various operations on the inserted page (specifically, the page can be translated and scaled, via the rect argument, and rotated, via the rotate argument`), but I can't see any way to apply arbitrary transforms.

Describe alternatives you've considered

I can't see another way to do this, other than by rendering the PDF and transforming the render.

Additional context

I'm the maintainer of PSUtils, a PostScript and PDF document manipulator. It currently uses PyPDF to work with PDF files. I would like to use PyMuPDF instead for better performance.

It was pretty easy to rework PSUtils to use PyMuPDF, and I have almost the entire test suite passing, but one of the remaining failures is for horizontally and vertically mirroring pages, which I realise, at this late stage, that MuPDF seems not to be able to do.

I'm a bit surprised! Maybe I overlooked something? But I have looked at the C API too and I can't find a way to do it, so maybe it's an upstream problem.

In any case, it's easy to do in PyPDF, so I don't suppose this is a limitation of PDF itself.

Thanks for MuPDF! I hope to be able to use it eventually.

rrthomas avatar Nov 30 '25 14:11 rrthomas

Thanks for your interest in PyMuPDF! It should be possible to do this. I suppose you are referring to left-right / top-bottom flips?

JorjMcKie avatar Dec 01 '25 08:12 JorjMcKie

Thanks for the quick response! Left-right/top-bottom flips is indeed what I'm referring to, though it would seem reasonable to be able to apply any regular PDF-style transformation matrix.

rrthomas avatar Dec 01 '25 10:12 rrthomas

Thanks for the quick response! Left-right/top-bottom flips is indeed what I'm referring to, though it would seem reasonable to be able to apply any regular PDF-style transformation matrix.

Yes - I didn't mean to restrict the allowable matrix type unnecessarily. The only thing to take seriously is whether we want to allow the final result leave the target rectangle - i.e. matrix.e / matrix.f should probably be forced to be zero.

JorjMcKie avatar Dec 01 '25 10:12 JorjMcKie

Yes - I didn't mean to restrict the allowable matrix type unnecessarily. The only thing to take seriously is whether we want to allow the final result leave the target rectangle - i.e. matrix.e / matrix.f should probably be forced to be zero.

That sounds sensible, in the context of PyMuPDF.

I must admit, I find MuPDF rather complicated in this regard. With PyPDF, one just specifies the 6-element transformation matrix. With PyMuPDF, I find I have to both rotate my transformation matrix and specify the rotate argument to show_pdf_page.

It took me a long time to work out that I couldn't specify flipping/mirroring because what I was actually supplying to show_pdf_page was a bounding box, so flipping/mirroring it has no effect, as it's the same bounding box afterwards (i.e. it doesn't matter which order its coordinates are given in); whereas translating and scaling it did work, because they change the box.

So it would be good, in particular for more general uses, to have a different method that just takes a Matrix, something like:

show_transformed_pdf_page(matrix, docsrc, pno=0, overlay=True, oc=0, clip=None)

So that the matrix argument replaces rect, keep_proportion and rotate, and all of that information comes from the 6-element matrix.

For what I'm asking for, it would seem simpler to add hflip and vflip boolean arguments to show_pdf_page, as if you add a matrix argument, then either you have to specify (as you suggest) that some elements are ignored, or it becomes much more complicated to specify the behaviour of show_pdf_page.

rrthomas avatar Dec 01 '25 10:12 rrthomas

Valid considerations! Still, the real situation can be quite complex:

  • source and target page can both be rotated
  • do we really want to prohibit rotation just because of flipping is desired?
  • similar for keep aspect ratio yes/no.

The other thing is that the source page insertion can be restricted to a sub-rectangle ("clip"). Currently, there is no way but to copy over the full page and only show the clipped source part. This leads to surprising (undesired) effects when the target page's text is being extracted later. Etc.

JorjMcKie avatar Dec 01 '25 12:12 JorjMcKie

Valid considerations! Still, the real situation can be quite complex:

* source and target page can both be rotated

I'm not sure what difference this makes? In this context, the target page is considered after any transformations have been applied to it. The source page is transformed by some matrix and imposed on the already-transformed target page.

* do we really want to prohibit rotation just because of flipping is desired?

No!

* similar for keep aspect ratio yes/no.

Again, no!

I don't think I suggested prohibiting anything anywhere?

To restate what I'm suggesting in broad-brush terms: currently, show_pdf_page has some arbitrary limitations on the transformations that can be applied to the page being imposed on the current page. I'm suggesting adding an API that simply takes a Matrix to apply to the page being drawn. This covers the general case.

I also suggested adding some options to show_pdf_page to cover the specific simple case of vertical/horizontal flips. But I now think this is a bad idea; show_pdf_page is already too complicated and confusing.

The other thing is that the source page insertion can be restricted to a sub-rectangle ("clip"). Currently, there is no way but to copy over the full page and only show the clipped source part. This leads to surprising (undesired) effects when the target page's text is being extracted later. Etc.

I don't really understand this. Applying a clip path is a presentation operation; there's no more reason to expect that to change the content than e.g. translating content off the page.

I'm sorry if I've missed something, but doing this is all entirely straightforward and unsurprising with PyPDF; it seems to me that the complexity here is an artefact of the way that MuPDF works, rather than a real problem.

Here's the PyPDF API: https://pypdf.readthedocs.io/en/latest/modules/PageObject.html#pypdf._page.PageObject.merge_transformed_page

This merges a source page into a target page. Note that any transformation applied to the target page has already happened, so we don't need to worry about it. The source page is simply transformed by the given matrix, then drawn on top of the target page. (This lacks the functionality of the overlay and oc arguments to show_pdf_page, but they could be added without complicating matters.)

I also appreciate that there's a consideration of the degree to which PyMuPDF adds its own way of doing things, rather than simply wrapping MuPDF.

rrthomas avatar Dec 01 '25 12:12 rrthomas

It would be good to know what your plans are for this issue: my options seem to be:

  • Wait until it is resolved, then switch to mupdf (my preferred option!)
  • Add mupdf as a dependency to PSUtils, and fall back to PyPDF in cases where flipping is used (this will be a bit of a PITA from a maintenance point of view)

(I've excluded getting stuck into MuPDF at the moment, as I don't think the amount of extra study that would require is a good use of my time versus maintaining code I already have some grasp of!)

rrthomas avatar Dec 08 '25 18:12 rrthomas

Your request is on our TODO list and we certainly will improve the flexibility of the method. But we cannot yet assign a schedule to this: there are too many other things going on currently.

JorjMcKie avatar Dec 10 '25 10:12 JorjMcKie

That's great, thanks!

rrthomas avatar Dec 10 '25 11:12 rrthomas