feat(multi-layout): implement form field copying and transformation for multi-page PDF to keep form data
Description of Changes
This pull request adds support for all common PDF AcroForm field types to multi-page PDF controller. Previously these forms were not kept, now however they are kept.
how it works
- The code use methods to identify, transform, and copy to new PDF variety of form fields, ensuring the output document is a accurate and fully interactive copy of the original.
Tests for multi page controller
- Small Junit tests to test basic functionality of the controller
- Added MultiPageLayoutControllerTest to verify merging logic, including validation for allowed
pagesPerSheetvalues and correct handling of perfect-square, 2-up, and 3-up layouts. - Tests ensure the controller rejects invalid pagesPerSheet values, generates PDFs with correct content types and filenames, and handles files with and without extensions appropriately.
Limitations/quirks
- It is sadly not possible to account for ALL possible form fields, this PR focuses on the more common AcroForm field types
- It is sadly NOT possible to copy Javascript to new PDF, since PDFBox does not support this, so this is only for forms, no JS support
- In both of these cases this PR introduces no change in behavior (tested), there is no negative impact from this PR
Example files using defaults
Before EDIT OoPdfFormExample_layoutChanged.pdf
After EDIT OoPdfFormExample_layoutChanged.pdf
Before Form_layoutChanged.pdf
After Form_layoutChanged.pdf
Before interactiveform_enabled_layoutChanged.pdf
After interactiveform_enabled_layoutChanged.pdf
Before Sample-Fillable-PDF_layoutChanged.pdf After Sample-Fillable-PDF_layoutChanged.pdf
UI
- No UI change whatsoever
Checklist
General
- [x] I have read the Contribution Guidelines
- [x] I have read the Stirling-PDF Developer Guide (if applicable)
- [ ] I have read the How to add new languages to Stirling-PDF (if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings
Documentation
- [ ] I have updated relevant docs on Stirling-PDF's doc repo (if functionality has heavily changed)
- [ ] I have read the section Add New Translation Tags (for new translation tags only)
UI Changes (if applicable)
- [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR)
Testing (if applicable)
- [x] I have tested my changes locally. Refer to the Testing Guide for more details.
finally getting round to going through all these PRs
Some review notes.. We’re mutating the source AcroForm by “cleaning up” widgets. That can surprise any later code that reads the same PDDocument. Let’s avoid changing the source and just skip bad widgets when copying. (If possible)
We scan the entire field tree to find the field for each widget. On big forms this can get slow. Build a widget→field map once and reuse it during the copy so lookups are constant time.
Widget rectangles are translated/scaled but not rotated to match the source page’s rotation. This will misplace fields on rotated inputs (e.g., landscape scans). We should apply the same rotation logic used for content when calculating widget positions.
Relying only on NeedAppearances is hit-or-miss across viewers. After setting values, we should refresh appearances so checkmarks/bullets reliably show up.
Also these PRs are huge for single class, I keep wondering if we can split these into Utils and how reusable it could ever be
Also these PRs are huge for single class, I keep wondering if we can split these into Utils and how reusable it could ever be
I can absolutely do this yes.
Let’s avoid changing the source and just skip bad widgets when copying. (If possible)
I don't this is possible, well it is necessary because of invisible and non-functional form fields.
We scan the entire field tree to find the field for each widget. On big forms this can get slow. Build a widget→field map once and reuse it during the copy so lookups are constant time.
Thanks for the suggestion, it sounds very good.
Widget rectangles are translated/scaled but not rotated to match the source page’s rotation. This will misplace fields on rotated inputs (e.g., landscape scans). We should apply the same rotation logic used for content when calculating widget positions.
yes agreed, it may cause problems. However my code is more or less consistent with other codebases on this topic, as far as I got from other PDF manipulation libraries (tickets/discussions), the implementation of this is that this is much easier said than done. It basically wildly inconsistent how different PDF tools achieve rotation with forms.
Relying only on NeedAppearances is hit-or-miss across viewers. After setting values, we should refresh appearances so checkmarks/bullets reliably show up.
Agreed, thanks for the suggestion
I'll update this PR, thanks for the review
Quickly tested rotated PDF, it is indeed buggy, I'll update to default back original behavior when the PDF is rotated (no forms in that case) since rotating forms would be quite the undertaking. (Possible improvement for the future)
Hi,
Feedback addressed in: 09dff8d
Also these PRs are huge for single class, I keep wondering if we can split these into Utils and how reusable it could ever be
Done. Logic moved to; stirling.software.common.util.FormUtils
I'll test with form PDFs to see if any controller can improved with the new Utils class.
We’re mutating the source AcroForm by “cleaning up” widgets. That can surprise any later code that reads the same PDDocument. Let’s avoid changing the source and just skip bad widgets when copying. (If possible)
Done, added more verbose null checking to avoid surprises but it should be straight forward transformation now
We scan the entire field tree to find the field for each widget. On big forms this can get slow. Build a widget→field map once and reuse it during the copy so lookups are constant time.
Done
Widget rectangles are translated/scaled but not rotated to match the source page’s rotation. This will misplace fields on rotated inputs (e.g., landscape scans). We should apply the same rotation logic used for content when calculating widget positions.
For now rotated pdf form are skipped. Possible future improvement (I'll look into it)
Relying only on NeedAppearances is hit-or-miss across viewers. After setting values, we should refresh appearances so checkmarks/bullets reliably show up.
Done
feel free to let me know if there is any other places for improvement :)
So after bit of digging the following controllers don't handle forms (and we can address these by using or extending formUtils):
- pdf-to-single-page
- adjust-contrast
- crop
- split
- scale (although these last 3 cases somewhat are complicated)
Obviously, it might always be feasible to implement this, however there are definitely endpoint that might benefit from more careful handling of forms.
This is more less my opinion, if not needed/wanted let me know :)
rs don't handle forms (and we can address these by using or extending formUtils):
- pdf-to-single-page
- adjust-contrast
- crop
- split
- scale (although these last 3 cases somewhat are complicated)
Obviously, it might always be feasible to implement this, however there are definitely endpoint that might benefit from more
Great find, certainly something to resolve in the future
Thanks :)
Looking forward on working on them in the future. I would like keep this PR "lighter", by addressing rest in other PRs if possible. If the FormUtils is good/multi-page PDF also good, then I am satisfied with this in the meantime.
Here some of the example files (not necessary important to check manually just, they are well examples for context):
- pdf-to-single-page: Sample-Fillable-PDF_merged_unsigned_singlePage.pdf
- adjust-contrast: Sample-Fillable-PDF_merged_unsigned_scaled.pdf
- crop: Sample-Fillable-PDF_merged_unsigned_cropped.pdf
- scale: Sample-Fillable-PDF_merged_unsigned_scaled-1.pdf