Fix NoneType object is not subscriptable from _insert_filtered_annotations
There are cases where the IndirectObject is not None, but d[0] will fail with TypeError: 'NoneType' object is not subscriptable in generic/_base.py, at return self._get_object_with_check()[key].
I previously reported this issue in #3211, but the fix for that issue didn't fix this one.
Unfortunately i still can't attach the PDF file that triggered this issue, because it contains personal information. If someone has an idea how to create a PDF to test this with, i'd be happy to try. I fully understand that a maintainer would not be keen on including fixes for rare bugs without a test.
More details on the error
The script i used to reproduce the error:
import glob
from pypdf import PdfWriter
def merge_pdfs():
with PdfWriter() as merger:
for pdf in glob.glob("*.pdf"):
merger.append(pdf)
merger.write("merged.pdf")
if __name__ == '__main__':
merge_pdfs()
In the script directory i have a broken.pdf (the offending file i can't share here) and an empty.pdf.
This fails as below:
Object 6 0 not defined.
Object 6 0 not defined.
Overwriting cache for 0 6
Traceback (most recent call last):
File "/home/kees/Projects/pypdf/check_merge.py", line 21, in <module>
merge_pdfs()
~~~~~~~~~~^^
File "/home/kees/Projects/pypdf/check_merge.py", line 16, in merge_pdfs
merger.append(pdf)
~~~~~~~~~~~~~^^^^^
File "/home/kees/Projects/pypdf/pypdf/_writer.py", line 2693, in append
self.merge(
~~~~~~~~~~^
None,
^^^^^
...<4 lines>...
excluded_fields,
^^^^^^^^^^^^^^^^
)
^
File "/home/kees/Projects/pypdf/pypdf/_writer.py", line 2851, in merge
lst = self._insert_filtered_annotations(
pag.original_page.get("/Annots", []), pag, srcpages, reader
)
File "/home/kees/Projects/pypdf/pypdf/_writer.py", line 3055, in _insert_filtered_annotations
p = self._get_cloned_page(d[0], pages, reader)
~^^^
File "/home/kees/Projects/pypdf/pypdf/generic/_base.py", line 402, in __getitem__
return self._get_object_with_check()[key] # type: ignore
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable
Setting a breakpoint before the error occurs gives:
(Pdb) d
IndirectObject(6, 0, 127177489623296)
(Pdb) d[0]
Object 6 0 not defined.
Overwriting cache for 0 6
*** TypeError: 'NoneType' object is not subscriptable
(To be fair, in this last example i cheated a bit because the variable name d clashes with the pdb command. So i renamed that.)
Codecov Report
:x: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 96.95%. Comparing base (bc318d7) to head (9a27b3f).
:warning: Report is 20 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| pypdf/_writer.py | 50.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #3444 +/- ##
==========================================
- Coverage 96.97% 96.95% -0.03%
==========================================
Files 54 54
Lines 9337 9340 +3
Branches 1711 1711
==========================================
+ Hits 9055 9056 +1
- Misses 168 170 +2
Partials 114 114
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Thanks for the PR. Please have a look at the failing checks.
Thanks for the PR. Please have a look at the failing checks.
@stefan6419846 Which checks do you mean? I only see coverage, which i can't fix, as i stated in the description. If you see anything else that i could improve that would make you consider merging this, please let me know!
Which checks do you mean? I only see coverage, which i can't fix, as i stated in the description. If you see anything else that i could improve that would make you consider merging this, please let me know!
We have three failing checks:
- The patch coverage is incomplete, as we have no test for the error case.
- The project coverage decreases due to missing patch coverage.
- The PR title does not match our requirements.
At least the title should be solvable directly.
Unfortunately i still can't attach the PDF file that triggered this issue, because it contains personal information. If someone has an idea how to create a PDF to test this with, i'd be happy to try.
There surely are ways to modify existing example files used in tests to show this behavior (you should be able to find some examples in the tests). I currently do not have enough resources to look into this myself. There are ways to manually anonymize your file as well, but this requires more knowledge and might require you providing me the original file to look into.