pypdf
pypdf copied to clipboard
ENH: add decode_as_image() to ContentStreams
closes #2613
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 94.94%. Comparing base (
b1b55e6
) to head (b68b907
). Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #2615 +/- ##
=======================================
Coverage 94.93% 94.94%
=======================================
Files 50 50
Lines 8318 8327 +9
Branches 1668 1669 +1
=======================================
+ Hits 7897 7906 +9
Misses 261 261
Partials 160 160
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Should we really expect the users to basically call decode_image
on every object with arbitrary nesting as there might be a "hidden" image somewhere? This feels rather strange.
Additionally, what happens when it is no image? We log a warning, but is there an exception as well due to invalid image data? If yes, why both?
Should we really expect the users to basically call
decode_image
on every object with arbitrary nesting as there might be a "hidden" image somewhere? This feels rather strange.
Why strange. This offers a way to get the image from an stream where images are present but not part of the images (such as the use in pattern as provided in B2.pdf, but also in annotations)
Additionally, what happens when it is no image? We log a warning, but is there an exception as well due to invalid image data? If yes, why both?
I thought about this and my concern is that this may hide some actual issues. I've completed the annotation
I am still not sure whether we can really expect the user to examine every content stream for a possible image. Personally, I would prefer a clean solution, thus I am going to leave this PR open for further discussion.