pypdf
pypdf copied to clipboard
New function: get_form_fields_in_list()
Get form fields, using a list of labels, such as: get_form_fields_in_list([ '/Tx', '/Btn' ]) to get text and button (checkbox,...) values
Signed-off-by: Pascal Brand [email protected]
Codecov Report
Merging #1220 (3c58146) into main (2df8a4c) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #1220 +/- ##
=======================================
Coverage 92.43% 92.43%
=======================================
Files 24 24
Lines 4941 4943 +2
Branches 1025 1025
=======================================
+ Hits 4567 4569 +2
Misses 234 234
Partials 140 140
Impacted Files | Coverage Δ | |
---|---|---|
PyPDF2/_reader.py | 90.69% <100.00%> (+0.02%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Thank you for your contribution 🙏
I want to avoid adding more and more methods. Although this adds flexibility, it does confuse. Instead, we should make a decision in which context those functions will be used and make them fit best for that.
In it's current form I don't think I will mehr it, but I appreciate that your pr does not make a breaking change and the discussion it starts. Let's use it to have that discussion :-)
Do you happen to need that function? What do you need it for?
Hi Martin,
I have to retrieve pdf form information, not only text ones, but also checkbox status (on/off).
I had a look at stackoverflow, such as https://stackoverflow.com/questions/51185296/reading-checkbox-pdf-with-python-on-a-filled-out-form or https://stackoverflow.com/questions/3984003/how-to-extract-pdf-fields-from-a-filled-out-form-in-python
Unfortunately, the answers were not as easy as the simple get_form_text_fields()
api function, for which you do not have to understand the pdf structure, and dig too much in the code.
This is why I added this function.
I use it with get_form_fields_in_list(['/Tx', '/Btn'])
, so that I have text and checkbox values in the result.
Of course, I could have added a new function such as get_form_checkbox_fields()
, but I did not want to add too many methods ('cos then someone would like to have get_form_text_and_checkbox_fields()
too, plus others I am not aware of).
I would actually rather remove get_form_text_fields
from PyPDF2 completely as I think this can easily done by PyPDF2 users.
Instead, we might want to extend the help of get_fields
educating the user ... or return something more meaningful than just a dictionary.
I'm uncertain about this.
I would neither remove get_form_text_fields() nor change its returned format.
Following https://github.com/py-pdf/PyPDF2/discussions/850 discussion, I am on the same line than @MasterOdin and @sekrause, that is legacy is important and API break is a pain for people using the library and people reading example code (for example from stackoverflow - old correct answers would not work anymore).
I guess that such consideration should be discussed with a broader range of people, both people updating pyPDF2 and people using it. I would be happy to participate.
Anyway, it is bit beyond this PR I suggest. Feel free to close this PR without merging it.
I agree that we should hear more opinions :+1: I will leave this PR open for discussions.
@MasterOdin / @pubpub-zz : As always, I'm very interested in your opinion :-)
Agree with @pascal-brand38 About keeping legacy interfaces
@pascal-brand38 Thank you for working on improving PyPDF2, but I've finally decided to close this PR.
The PR added get_form_fields_in_list
which can be easily implemented by the users:
def get_form_fields_in_list(reader):
formfields = reader.get_fields()
return {
formfields[field]["/T"]: formfields[field].get("/V")
for field in formfields
if formfields[field].get("/FT") in ft_list
}
The user still needs to have knowledge about the types of fields, meaning that this function doesn't make the interaction with PDF files simpler. Those two arguments combined lead me to the conclusion that it's better to not extend the public interface of PyPDF2 with this method (for the moment at least).