pdf-reader icon indicating copy to clipboard operation
pdf-reader copied to clipboard

Embeded Images do not require space before EI token

Open rstawarz opened this issue 13 years ago • 3 comments

The source of the issue we saw that was classified as: https://github.com/yob/pdf-reader/pull/17#issuecomment-5852356 has an embedded image that does not have any space between the end of the image data and the EI token. I pulled out the section of the PDF that was causing the failure and added the spec here: https://github.com/rstawarz/pdf-reader/commit/9601d33c6999a45074f9d0c8ba3366daec43afb1

Reading the PDF spec it seems the embedded image definition follows that of the stream objects (section 7.3.8) which is beautifully written as "There should be an end-of-line marker after the data and before endstream;"... the operative word being 'should be'.

I was going to change the buffer parser directly but you have specs in there that specify that an 'EI' should be allowed inside the image stream. Without implementing some sort of look ahead, it seems the two are mutually exclusive. Any thoughts?

rstawarz avatar May 24 '12 21:05 rstawarz

Thanks for the spec and clear issue report.

Last year another user had a sample PDF that had the bytes 0x45 0x49 (EI) inside an inline image.

Your PDF and the earlier one seem mutually exclusive - I'm not sure what the best behaviour is.

yob avatar May 28 '12 01:05 yob

No problem... this is the great world of the PDF Spec :)

My thoughts were to peek at the couple bytes AFTER the 'EI' end token to check for a start of the next token, but that would include ANY possible token I would think.

rstawarz avatar May 28 '12 20:05 rstawarz

having a similar problem with a very specific accounting document...

andrewajo avatar Jun 11 '12 19:06 andrewajo