pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

ENH: Add PageObject.images attribute

Open MartinThoma opened this issue 1 year ago • 2 comments

MartinThoma avatar Sep 07 '22 19:09 MartinThoma

Codecov Report

Base: 94.71% // Head: 94.55% // Decreases project coverage by -0.15% :warning:

Coverage data is based on head (50447af) compared to base (71de6c8). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1330      +/-   ##
==========================================
- Coverage   94.71%   94.55%   -0.16%     
==========================================
  Files          30       28       -2     
  Lines        5181     5016     -165     
  Branches     1060     1033      -27     
==========================================
- Hits         4907     4743     -164     
  Misses        164      164              
+ Partials      110      109       -1     
Impacted Files Coverage Δ
PyPDF2/_page.py 95.12% <100.00%> (+0.12%) :arrow_up:
PyPDF2/filters.py 97.23% <100.00%> (+0.01%) :arrow_up:
PyPDF2/generic/_base.py 100.00% <0.00%> (ø)
PyPDF2/_utils.py
PyPDF2/__init__.py
PyPDF2/_writer.py 91.10% <0.00%> (+0.06%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Sep 15 '22 20:09 codecov[bot]

@pubpub-zz @MasterOdin What do you think about this PR?

While I wrote it, I realized that PyPDF2 does something wrong with image extraction in some cases. I marked those tests with xfail. The point of this PR is not to fix those issues, but to provide a convenient interface for getting images from PDF pages. That means:

  • Define the property / the method to get images
  • Define the return value (List[File] as well as the new File class)

@pubpub-zz You mentioned that this method might not get all images of a page. For this PR, this would be acceptable to me. We can fix that later.

As a follow-up step we might use the File class for attachments as well.

I'm uncertain about the mime_type parts. Should we use extension everywhere instead?

The reason why I chose mime-type were spelling inconsistencies like this:

  • PNG vs png
  • jpg vs jpeg

Additionally, I'm uncertain if using extension vs mime_type makes a difference if we use the File class for attachments as well.

MartinThoma avatar Sep 17 '22 13:09 MartinThoma