ENH: Add PageObject.images attribute

Open MartinThoma opened this issue 1 year ago • 2 comments

Sep 07 '22 19:09 MartinThoma

Codecov Report

Base: 94.71% // Head: 94.55% // Decreases project coverage by -0.15% :warning:

Coverage data is based on head (50447af) compared to base (71de6c8). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1330      +/-   ##
==========================================
- Coverage   94.71%   94.55%   -0.16%     
==========================================
  Files          30       28       -2     
  Lines        5181     5016     -165     
  Branches     1060     1033      -27     
==========================================
- Hits         4907     4743     -164     
  Misses        164      164              
+ Partials      110      109       -1

Impacted Files	Coverage Δ
PyPDF2/_page.py	`95.12% <100.00%> (+0.12%)`	:arrow_up:
PyPDF2/filters.py	`97.23% <100.00%> (+0.01%)`	:arrow_up:
PyPDF2/generic/_base.py	`100.00% <0.00%> (ø)`
PyPDF2/_utils.py
PyPDF2/__init__.py
PyPDF2/_writer.py	`91.10% <0.00%> (+0.06%)`	:arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Sep 15 '22 20:09 codecov[bot]

@pubpub-zz @MasterOdin What do you think about this PR?

While I wrote it, I realized that PyPDF2 does something wrong with image extraction in some cases. I marked those tests with xfail. The point of this PR is not to fix those issues, but to provide a convenient interface for getting images from PDF pages. That means:

Define the property / the method to get images
Define the return value (List[File] as well as the new File class)

@pubpub-zz You mentioned that this method might not get all images of a page. For this PR, this would be acceptable to me. We can fix that later.

As a follow-up step we might use the File class for attachments as well.

I'm uncertain about the mime_type parts. Should we use extension everywhere instead?

The reason why I chose mime-type were spelling inconsistencies like this:

PNG vs png
jpg vs jpeg

Additionally, I'm uncertain if using extension vs mime_type makes a difference if we use the File class for attachments as well.

Sep 17 '22 13:09 MartinThoma

pypdf pypdf copied to clipboard

ENH: Add PageObject.images attribute

Codecov Report

pypdf
pypdf copied to clipboard