pdfbox icon indicating copy to clipboard operation
pdfbox copied to clipboard

Add option to skip corrupt PDFs in PDFMergerUtility with improved exception handling

Open SwethaMuthuvel opened this issue 6 months ago • 2 comments

What This PR Does

This pull request improves the robustness and debuggability of PDFMergerUtility by:

  1. Adding a skipCorruptFiles flag

    • Allows users to skip unreadable or corrupt PDF files during merge.
    • Default behavior remains unchanged (i.e., throws on error).
  2. Wrapping IOException with source context

    • Converts vague errors like:
      IOException: Could not parse object stream
      
      into more useful messages like:
      IOException: Failed to load PDF from source: /path/to/file.pdf
      
    • Helps identify exactly which file failed.
  3. Applied consistently in both merge modes

    • optimizedMergeDocuments(...)
    • legacyMergeDocuments(...)
    • Added warning logs when skipping files.

Why This Helps

  • Improves debuggability — pinpoints which file caused the failure.
  • Makes batch operations resilient — avoids total failure from one bad input.
  • Scales better — suitable for bulk merging scenarios.
  • Does not break existing behavior — opt-in via setSkipCorruptFiles(true).

SwethaMuthuvel avatar Jul 04 '25 07:07 SwethaMuthuvel