lopdf icon indicating copy to clipboard operation
lopdf copied to clipboard

Add relaxed mode (ignores things like false byte offsets in xref table)

Open manfredlotz opened this issue 6 years ago • 5 comments

Found another error for http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/ksp-thesis/ksp-thesis.pdf which gives:

Custom { kind: InvalidData, error: StringError("Not a valid PDF file (xref_and_trailer).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 267986 }") }

manfredlotz avatar Oct 06 '18 17:10 manfredlotz

The startxref value is wrong in this file, should be 267985.

J-F-Liu avatar Oct 07 '18 01:10 J-F-Liu

pdfinfo doesn't notice it. All pdf viewers in my system are forgiving and don't complain.

Locally on my hard disk I have more pdf documents showing this error.

So, what to do? Perhaps lopdf should have a relaxed mode when parsing where such things will be accepted?

manfredlotz avatar Oct 07 '18 03:10 manfredlotz

It is little annoying that it prints these things into stdout with no possibility to turn this off even in release mode.

Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 37958).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 37958 }") }

Yes this pdf is not 100% correct as references to some objects points at one character before actual object at newline character, but pdf viewers learned to forgive these things.

At least would be better to use rust logging for these things like:

	Err(err) => {
		warn!("{:?}", err); // or error!
	}

misos1 avatar Oct 24 '18 20:10 misos1

I vote to make the error logging configurable and a relaxed parsing mode. I am seeing a similar error: (Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 16845).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 16995 }")) while attempting to convert some PDF's to text using pdf-extract.. I can also open the PDF throwing this error with evince on a GNU/Linux distribution. ~~Should we break these out into 2 different issues?~~ See https://github.com/J-F-Liu/lopdf/issues/46 for the configurable logging. So, perhaps consider adding a relaxed parsing option / default?

ghost avatar Mar 04 '19 13:03 ghost