pypdf BUG: Fix Parsing of Inline Images

BUG: Fix Parsing of Inline Images

Open speedplane opened this issue 7 years ago • 7 comments

The inline image parser does not look for whitespace before the EI keyword as it should. Thus if you have a content stream as follows, the parser would crash:

BI [inline image dictionary]
ID
asfASF213ad>]asf
213lkasdf9as12EI
QsdkfjasdfkjfdiI
EI
Q

Notice the EI on one line and the Q on the following line occurs in two places. To properly check, we need to make sure the EI is preceded by white-space.

Also, added a protection against infinite loops in case the PDF is corrupt and the inline image never ends.

Feb 28 '17 05:02 speedplane

#331 is also implements protection against incorrect images. Also make parsing of inline images a lot faster.

Jul 20 '17 14:07 vstoykov

The current solution is not compatible with the recent BytesIO implementation. Do you mind to adjust your PR?

Apr 16 '22 13:04 MartinThoma

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

Apr 16 '22 19:04 speedplane

I fixed the merge conflict, I'm not sure what you're referring to re BytesIO.

CI is failing:

Apr 16 '22 20:04 MartinThoma

@speedplane We made some pretty heavy changes to PyPDF2 recently. If you search for if tok2 == b"I": in generic.py, you can see the section that you adjusted. Do you want to adjust the PR / open a new PR?

Do you have an example PDF where this adjustment is necessary? Does it close one of the open issues?

Jun 19 '22 11:06 MartinThoma

It would help me a lot if we had an image that shows the described issue.

Jul 24 '22 07:07 MartinThoma

Sorry, this is all I have. I can't remember what this fixed or how it fixes it.

Aug 04 '22 03:08 speedplane

@speedplane The issue you addressed was fixed via #1327 .

May I add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html ? Your PR was not merged, but you did make a valuable contribution with this PR. It was just me not being able to understand it at the time.

Sep 06 '22 19:09 MartinThoma

pypdf pypdf copied to clipboard

BUG: Fix Parsing of Inline Images

pypdf
pypdf copied to clipboard