pdfbox
pdfbox copied to clipboard
Mirror of Apache PDFBox
The XMP box library is nice, but out in the wild are PDF files that fail parsing. For example dc.create is a Bag instead of a Seq. Ideally the parser...
This pull request is discussed in Jira ticket: https://issues.apache.org/jira/browse/PDFBOX-4073 Our take on this: There could be a need to work with millimetres or inches instead of points. @THausherr commented that...
Addresses issue https://issues.apache.org/jira/browse/PDFBOX-5025 Unreading the trailing 'e' from the endobject string allows parsing to continue and complete as expected.
Addresses issue https://issues.apache.org/jira/browse/PDFBOX-5026 Rebuilding the trailer when the pages item is missing can allow the building of the PDF when lenient parsing is enabled.
https://issues.apache.org/jira/browse/PDFBOX-3812
This pull request is discussed in JIRA ticket: https://issues.apache.org/jira/browse/PDFBOX-4952 I implemented a basic starting point to realize a PDF compression based on PDFBox 2.0.22-SNAPSHOT I want to use this ticket,...
If read is done using Windows-1252 and write using UTF-8 then PDF containing Windows-1252 encoded XObject dictionry names will be broken after doin load and save PDDocument document = PDDocument.load(sourcePath);...
No need to allocate a new ArrayList here, reduce text extraction time from 16 seconds to 14 seconds on a 4.2M pdf.
draft solution, awaiting feedback. Our notes: This fix is currently being used in us.pdinc.products.cresaptown