pypdf ENH: add decode_as_image() to ContentStreams

ENH: add decode_as_image() to ContentStreams

Open pubpub-zz opened this issue 2 months ago • 4 comments

closes #2613

May 01 '24 12:05 pubpub-zz

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.94%. Comparing base (b1b55e6) to head (b68b907). Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2615   +/-   ##
=======================================
  Coverage   94.93%   94.94%           
=======================================
  Files          50       50           
  Lines        8318     8327    +9     
  Branches     1668     1669    +1     
=======================================
+ Hits         7897     7906    +9     
  Misses        261      261           
  Partials      160      160

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 01 '24 12:05 codecov[bot]

Should we really expect the users to basically call decode_image on every object with arbitrary nesting as there might be a "hidden" image somewhere? This feels rather strange.

Additionally, what happens when it is no image? We log a warning, but is there an exception as well due to invalid image data? If yes, why both?

May 01 '24 14:05 stefan6419846

Should we really expect the users to basically call decode_image on every object with arbitrary nesting as there might be a "hidden" image somewhere? This feels rather strange.

Why strange. This offers a way to get the image from an stream where images are present but not part of the images (such as the use in pattern as provided in B2.pdf, but also in annotations)

Additionally, what happens when it is no image? We log a warning, but is there an exception as well due to invalid image data? If yes, why both?

I thought about this and my concern is that this may hide some actual issues. I've completed the annotation

May 01 '24 16:05 pubpub-zz

I am still not sure whether we can really expect the user to examine every content stream for a possible image. Personally, I would prefer a clean solution, thus I am going to leave this PR open for further discussion.

May 02 '24 13:05 stefan6419846

pypdf pypdf copied to clipboard

ENH: add decode_as_image() to ContentStreams

Codecov Report

pypdf
pypdf copied to clipboard