pdfbox
pdfbox copied to clipboard
Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling
What This PR Does
This pull request improves the robustness and debuggability of PDFMergerUtility by:
-
Adding a
skipCorruptFilesflag- Allows users to skip unreadable or corrupt PDF files during merge.
- Default behavior remains unchanged (i.e., throws on error).
-
Wrapping
IOExceptionwith source context- Converts vague errors like:
into more useful messages like:IOException: Could not parse object streamIOException: Failed to load PDF from source: /path/to/file.pdf - Helps identify exactly which file failed.
- Converts vague errors like:
-
Applied consistently in both merge modes
optimizedMergeDocuments(...)legacyMergeDocuments(...)- Added warning logs when skipping files.
Why This Helps
- Improves debuggability — pinpoints which file caused the failure.
- Makes batch operations resilient — avoids total failure from one bad input.
- Scales better — suitable for bulk merging scenarios.
- Does not break existing behavior — opt-in via
setSkipCorruptFiles(true).