pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

MAINT: Use consistent method keywords

Open mtd91429 opened this issue 1 year ago • 4 comments

Explanation

As PyPDF2 has evolved, some of the internal naming conventions have not been consistent. One recently addressed issue was the outline/bookmark component; I think that was the largest example. As I have been exploring the code base, I have noticed others. I propose that these keywords be systematically implemented across the library and that the deprecation process utilizes an internally consistent process.

Below are the list of keywords that I can see which have inconsistent implementation (only including non-deprecated methods):

pagenum, page_number, pageNumber

  • pagenum is used in:
    • PdfWriter.addBookmark
    • PdfWriter.add_outline_item
    • PdfWriter.add_named_destination
    • PdfWriter.add_uri
    • PdfWriter.add_link
  • pageNumber is used in:
    • PdfWriter.get_page
  • page_number is used in:
    • PdfWriter.add_annotation
    • PdfWriter.get_page

pagedest and dest

  • pagedest is used in:
    • PdfWriter.add_link
  • dest is used in:
    • PdfWriter.add_outline_item_destination
    • PdfWriter.add_named_destination_object

indirect_ref and ido

  • indirect_ref is used in:
    • PdfReader._get_page_number_by_indirect
    • PdfReader._flatten
  • indirect_reference is used in:
    • PdfReader._get_object_from_stream
  • ido is used in:
    • PdfWriter.get_object

pwd and password

  • password is used in:
    • PdfReader.decrypt
  • pwd is used in:
    • PdfWriter.encrypt (as user_pwd, owner_pwd)

Extrinsically inconsistent methods

There are some deprecated methods with keywords that have been changed in the updated function. For example, PdfWriter.addAttachment uses fname and PdfWriter.add_attachment uses filename or PdfWriter.removeImages uses ignoreByteStringObject and PdfWriter.remove_images uses ignore_byte_string_object. These changes are silent.

Intrinsically inconsistent methods

There are some methods which use keywords that are both snake_case and "smooshed" case. For example, PdfReader._read_xref_tables_and_trailers has both startxref and xref_issue_nr,

Inconsistent deprecation implementation

Some of the keywords have been deprecated within the methods. For example, PdfWriter.get_page. Whereas other keywords have been deprecated via decorator; for example, PdfWriter.add_outline_item_dict.

Concluding remarks (for this long winded post)

I don't think all of these examples necessarily need to be changed; I've included them to be as thorough as possible. However, in my opinion, snake_case is best and explicit is better than implied via abbreviation (usually). Please list additional examples that you see and any proposed approaches.

mtd91429 avatar Jul 31 '22 01:07 mtd91429

Wow, amazing work! 😮 Thank you so much 👏👏🤗

I completely agree with your concluding remarks!

MartinThoma avatar Jul 31 '22 01:07 MartinThoma

I would completely ignore already deprecated functions / parameters. Think of them as no longer being present in the library. You're seeing a ghost of the past there :-)

MartinThoma avatar Jul 31 '22 01:07 MartinThoma

I have another one: Instead of page_number or start_page_number or similar, PdfMerger.merge uses position

MartinThoma avatar Aug 19 '22 04:08 MartinThoma

I have another one: Instead of page_number or start_page_number or similar, PdfMerger.merge uses position

Agree, Page_number should be prefered

pubpub-zz avatar Aug 19 '22 09:08 pubpub-zz

startxref comes from the PDF standard and thus is consistent. The rest was adjusted :-)

MartinThoma avatar Dec 10 '22 17:12 MartinThoma