lopdf
lopdf copied to clipboard
Add relaxed mode (ignores things like false byte offsets in xref table)
Found another error for http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/ksp-thesis/ksp-thesis.pdf which gives:
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (xref_and_trailer).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 267986 }") }
The startxref value is wrong in this file, should be 267985.
pdfinfo
doesn't notice it. All pdf viewers in my system are forgiving and don't complain.
Locally on my hard disk I have more pdf documents showing this error.
So, what to do? Perhaps lopdf should have a relaxed mode when parsing where such things will be accepted?
It is little annoying that it prints these things into stdout with no possibility to turn this off even in release mode.
Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 37958).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 37958 }") }
Yes this pdf is not 100% correct as references to some objects points at one character before actual object at newline character, but pdf viewers learned to forgive these things.
At least would be better to use rust logging for these things like:
Err(err) => {
warn!("{:?}", err); // or error!
}
I vote to make the error logging configurable and a relaxed parsing mode.
I am seeing a similar error: (Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 16845).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 16995 }")
) while attempting to convert some PDF's to text using pdf-extract.. I can also open the PDF throwing this error with evince
on a GNU/Linux distribution.
~~Should we break these out into 2 different issues?~~
See https://github.com/J-F-Liu/lopdf/issues/46 for the configurable logging.
So, perhaps consider adding a relaxed parsing option / default?