pypdf
pypdf copied to clipboard
MAINT: Use consistent method keywords
Explanation
As PyPDF2 has evolved, some of the internal naming conventions have not been consistent. One recently addressed issue was the outline/bookmark component; I think that was the largest example. As I have been exploring the code base, I have noticed others. I propose that these keywords be systematically implemented across the library and that the deprecation process utilizes an internally consistent process.
Below are the list of keywords that I can see which have inconsistent implementation (only including non-deprecated methods):
pagenum
, page_number
, pageNumber
- pagenum is used in:
-
PdfWriter.addBookmark
-
PdfWriter.add_outline_item
-
PdfWriter.add_named_destination
-
PdfWriter.add_uri
-
PdfWriter.add_link
-
-
pageNumber
is used in:-
PdfWriter.get_page
-
-
page_number
is used in:-
PdfWriter.add_annotation
-
PdfWriter.get_page
-
pagedest
and dest
-
pagedest
is used in:-
PdfWriter.add_link
-
-
dest
is used in:-
PdfWriter.add_outline_item_destination
-
PdfWriter.add_named_destination_object
-
indirect_ref
and ido
-
indirect_ref
is used in:-
PdfReader._get_page_number_by_indirect
-
PdfReader._flatten
-
-
indirect_reference
is used in:-
PdfReader._get_object_from_stream
-
-
ido
is used in:-
PdfWriter.get_object
-
pwd
and password
-
password
is used in:-
PdfReader.decrypt
-
-
pwd
is used in:-
PdfWriter.encrypt
(asuser_pwd
,owner_pwd
)
-
Extrinsically inconsistent methods
There are some deprecated methods with keywords that have been changed in the updated function. For example, PdfWriter.addAttachment
uses fname and PdfWriter.add_attachment
uses filename or PdfWriter.removeImages
uses ignoreByteStringObject
and PdfWriter.remove_images
uses ignore_byte_string_object
. These changes are silent.
Intrinsically inconsistent methods
There are some methods which use keywords that are both snake_case and "smooshed" case. For example, PdfReader._read_xref_tables_and_trailers
has both startxref
and xref_issue_nr
,
Inconsistent deprecation implementation
Some of the keywords have been deprecated within the methods. For example, PdfWriter.get_page
. Whereas other keywords have been deprecated via decorator; for example, PdfWriter.add_outline_item_dict
.
Concluding remarks (for this long winded post)
I don't think all of these examples necessarily need to be changed; I've included them to be as thorough as possible. However, in my opinion, snake_case is best and explicit is better than implied via abbreviation (usually). Please list additional examples that you see and any proposed approaches.
Wow, amazing work! 😮 Thank you so much 👏👏🤗
I completely agree with your concluding remarks!
I would completely ignore already deprecated functions / parameters. Think of them as no longer being present in the library. You're seeing a ghost of the past there :-)
I have another one: Instead of page_number
or start_page_number
or similar, PdfMerger.merge
uses position
I have another one: Instead of
page_number
orstart_page_number
or similar,PdfMerger.merge
usesposition
Agree, Page_number
should be prefered
startxref
comes from the PDF standard and thus is consistent. The rest was adjusted :-)