Stirling-PDF icon indicating copy to clipboard operation
Stirling-PDF copied to clipboard

feat(multi-layout): implement form field copying and transformation for multi-page PDF to keep form data

Open balazs-szucs opened this issue 4 months ago • 8 comments

Description of Changes

This pull request adds support for all common PDF AcroForm field types to multi-page PDF controller. Previously these forms were not kept, now however they are kept.

how it works

  • The code use methods to identify, transform, and copy to new PDF variety of form fields, ensuring the output document is a accurate and fully interactive copy of the original.

Tests for multi page controller

  • Small Junit tests to test basic functionality of the controller
  • Added MultiPageLayoutControllerTest to verify merging logic, including validation for allowed pagesPerSheet values and correct handling of perfect-square, 2-up, and 3-up layouts.
  • Tests ensure the controller rejects invalid pagesPerSheet values, generates PDFs with correct content types and filenames, and handles files with and without extensions appropriately.

Limitations/quirks

  • It is sadly not possible to account for ALL possible form fields, this PR focuses on the more common AcroForm field types
  • It is sadly NOT possible to copy Javascript to new PDF, since PDFBox does not support this, so this is only for forms, no JS support
  • In both of these cases this PR introduces no change in behavior (tested), there is no negative impact from this PR

Example files using defaults

Before EDIT OoPdfFormExample_layoutChanged.pdf

After EDIT OoPdfFormExample_layoutChanged.pdf

Before Form_layoutChanged.pdf

After Form_layoutChanged.pdf

Before interactiveform_enabled_layoutChanged.pdf

After interactiveform_enabled_layoutChanged.pdf

Before Sample-Fillable-PDF_layoutChanged.pdf After Sample-Fillable-PDF_layoutChanged.pdf

UI

  • No UI change whatsoever

Checklist

General

Documentation

UI Changes (if applicable)

  • [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR)

Testing (if applicable)

  • [x] I have tested my changes locally. Refer to the Testing Guide for more details.

balazs-szucs avatar Aug 27 '25 14:08 balazs-szucs

finally getting round to going through all these PRs

Some review notes.. We’re mutating the source AcroForm by “cleaning up” widgets. That can surprise any later code that reads the same PDDocument. Let’s avoid changing the source and just skip bad widgets when copying. (If possible)

We scan the entire field tree to find the field for each widget. On big forms this can get slow. Build a widget→field map once and reuse it during the copy so lookups are constant time.

Widget rectangles are translated/scaled but not rotated to match the source page’s rotation. This will misplace fields on rotated inputs (e.g., landscape scans). We should apply the same rotation logic used for content when calculating widget positions.

Relying only on NeedAppearances is hit-or-miss across viewers. After setting values, we should refresh appearances so checkmarks/bullets reliably show up.

Frooodle avatar Sep 04 '25 13:09 Frooodle

Also these PRs are huge for single class, I keep wondering if we can split these into Utils and how reusable it could ever be

Frooodle avatar Sep 04 '25 13:09 Frooodle

Also these PRs are huge for single class, I keep wondering if we can split these into Utils and how reusable it could ever be

I can absolutely do this yes.

Let’s avoid changing the source and just skip bad widgets when copying. (If possible)

I don't this is possible, well it is necessary because of invisible and non-functional form fields.

We scan the entire field tree to find the field for each widget. On big forms this can get slow. Build a widget→field map once and reuse it during the copy so lookups are constant time.

Thanks for the suggestion, it sounds very good.

Widget rectangles are translated/scaled but not rotated to match the source page’s rotation. This will misplace fields on rotated inputs (e.g., landscape scans). We should apply the same rotation logic used for content when calculating widget positions.

yes agreed, it may cause problems. However my code is more or less consistent with other codebases on this topic, as far as I got from other PDF manipulation libraries (tickets/discussions), the implementation of this is that this is much easier said than done. It basically wildly inconsistent how different PDF tools achieve rotation with forms.

Relying only on NeedAppearances is hit-or-miss across viewers. After setting values, we should refresh appearances so checkmarks/bullets reliably show up.

Agreed, thanks for the suggestion

I'll update this PR, thanks for the review

balazs-szucs avatar Sep 04 '25 14:09 balazs-szucs

Quickly tested rotated PDF, it is indeed buggy, I'll update to default back original behavior when the PDF is rotated (no forms in that case) since rotating forms would be quite the undertaking. (Possible improvement for the future)

balazs-szucs avatar Sep 04 '25 14:09 balazs-szucs

Hi,

Feedback addressed in: 09dff8d

Also these PRs are huge for single class, I keep wondering if we can split these into Utils and how reusable it could ever be

Done. Logic moved to; stirling.software.common.util.FormUtils

I'll test with form PDFs to see if any controller can improved with the new Utils class.

We’re mutating the source AcroForm by “cleaning up” widgets. That can surprise any later code that reads the same PDDocument. Let’s avoid changing the source and just skip bad widgets when copying. (If possible)

Done, added more verbose null checking to avoid surprises but it should be straight forward transformation now

We scan the entire field tree to find the field for each widget. On big forms this can get slow. Build a widget→field map once and reuse it during the copy so lookups are constant time.

Done

Widget rectangles are translated/scaled but not rotated to match the source page’s rotation. This will misplace fields on rotated inputs (e.g., landscape scans). We should apply the same rotation logic used for content when calculating widget positions.

For now rotated pdf form are skipped. Possible future improvement (I'll look into it)

Relying only on NeedAppearances is hit-or-miss across viewers. After setting values, we should refresh appearances so checkmarks/bullets reliably show up.

Done

feel free to let me know if there is any other places for improvement :)

balazs-szucs avatar Sep 04 '25 19:09 balazs-szucs

So after bit of digging the following controllers don't handle forms (and we can address these by using or extending formUtils):

  • pdf-to-single-page
  • adjust-contrast
  • crop
  • split
  • scale (although these last 3 cases somewhat are complicated)

Obviously, it might always be feasible to implement this, however there are definitely endpoint that might benefit from more careful handling of forms.

This is more less my opinion, if not needed/wanted let me know :)

balazs-szucs avatar Sep 05 '25 15:09 balazs-szucs

rs don't handle forms (and we can address these by using or extending formUtils):

  • pdf-to-single-page
  • adjust-contrast
  • crop
  • split
  • scale (although these last 3 cases somewhat are complicated)

Obviously, it might always be feasible to implement this, however there are definitely endpoint that might benefit from more

Great find, certainly something to resolve in the future

Frooodle avatar Sep 06 '25 19:09 Frooodle

Thanks :)

Looking forward on working on them in the future. I would like keep this PR "lighter", by addressing rest in other PRs if possible. If the FormUtils is good/multi-page PDF also good, then I am satisfied with this in the meantime.

Here some of the example files (not necessary important to check manually just, they are well examples for context):

balazs-szucs avatar Sep 06 '25 19:09 balazs-szucs