pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

BUG: fix for merge page with annotation #3467

Open HSY-999 opened this issue 1 month ago • 5 comments

see #3467

there are a couple different solutions. this is the one i made.

for my solution, i have to assume that the key name inside DictionaryObject == __init__ parameter names. for example /Vertices and vertices. with these changes the example in #3467 exits normally.

i had to modify the __init__ logic for Polygon and PolyLine, as they both expected tuples which is not the format they are stored in DictionaryObject

if we agree this is acceptable solution, i will also add test case adapted from the issue.

HSY-999 avatar Nov 21 '25 21:11 HSY-999

* What happens to keyword-only arguments?

it does not work in a small example, but inspect does support matching on keyword arguments in a different list[str] with kwonlyargs. an additional check for this would suffice.

* What happens to unmatched keys?

they are not passed along to __init__ of the class we are trying to clone.

* What happens to keys which have to be renamed for some reasons, for example because they conflict with an existing name (like a builtin)?

good point. i will think on this. i am thinking of having some internal value in DictionaryObject or PdfObject which allow a developer to write its own key value pair if they are not the same.

HSY-999 avatar Nov 24 '25 22:11 HSY-999

i made changes to the pull request based on feedback. i still need to implement automated test, as well as research renaming keys. i have converted the pull request to a draft and will change the pull request when it is ready for review again.

HSY-999 avatar Nov 25 '25 21:11 HSY-999

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 97.16%. Comparing base (2f1b63e) to head (5266c84). :warning: Report is 14 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3532   +/-   ##
=======================================
  Coverage   97.15%   97.16%           
=======================================
  Files          56       56           
  Lines        9785     9814   +29     
  Branches     1785     1790    +5     
=======================================
+ Hits         9507     9536   +29     
  Misses        167      167           
  Partials      111      111           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 26 '25 19:11 codecov[bot]

i have introduced changes based on feedback. i still need to think about a solution to the keys conflicting with python builtins.

HSY-999 avatar Dec 05 '25 21:12 HSY-999

i have implemented possible solution for renaming keyword args. @stefan6419846 it is available to review when you are ready.

for the set_alt_arg_name function, you provide it keyword stored in DictionaryObject, and the value is the argument name. for example:

"/Vertices" -> "vertices"

HSY-999 avatar Dec 09 '25 18:12 HSY-999

i review some code in _markup_annotation.py and see classes (Ellipse, Highlight) which have separate __init__ argument name and DictionaryObject name. i will edit if need to make classes works with change

HSY-999 avatar Dec 16 '25 20:12 HSY-999

d__ = cast(
            "DictionaryObject",
            self._reference_clone(self.__new__(self.__class__), pdf_dest, force_duplicate),
        )

@stefan6419846 is there reason code uses self.__class__(), instead of self.__new__(self.__class__)? this change would fix the issues i think. very simple. self__class__() call __init__(), but what i think we really want is new instance without it to be initialized.

the dictionary key value pairs get copy in _clone. my pull request is very confusing compared to the self.__new__(self.__class__) change

HSY-999 avatar Dec 19 '25 20:12 HSY-999

d__ = cast(
            "DictionaryObject",
            self._reference_clone(self.__new__(self.__class__), pdf_dest, force_duplicate),
        )

@stefan6419846 is there reason code uses self.__class__(), instead of self.__new__(self.__class__)? this change would fix the issues i think. very simple. self__class__() call __init__(), but what i think we really want is new instance without it to be initialized.

the dictionary key value pairs get copy in _clone. my pull request is very confusing compared to the self.__new__(self.__class__) change

i test the changes for the feature to work yesterday. it seems to fix the issue and pass the test cases. i will open a new pull request with those changes and you may review it. i think it is a better solution.

HSY-999 avatar Dec 20 '25 17:12 HSY-999