pypdf
pypdf copied to clipboard
Microsoft Word table of contents Link annotation error.
I am trying to use PdfReader and PdfWriter to read/write annotations in pdf file. I use PDF file produced by Microsoft Word -> Save As PDF. Word file has 3 simple pages with headings Page 1, Page 2, Page 3 and automatic table of contents made from these headings. Links in table of contents become to be Link annotations in PDF file. Annotation itself looks like this
{'/Subtype': '/Link', '/Rect': [82.8, 711.57, 554.55, 731.07], '/BS': {'/W': 0}, '/F': 4, '/Dest': [IndirectObject(3, 0, 1202232362752), '/XYZ', 82, 785, 0], '/StructParent': 3}
Problem is value of '/Dest' key is list, but your code in _writer.py always expects dictionary. Then program tries to get value of tmp["target_page_index" from list, so that crash with error.
Please, help.
if to_add.get("/Subtype") == "/Link" and "/Dest" in to_add:
tmp = cast(Dict[Any, Any], to_add[NameObject("/Dest")])
dest = Destination(
NameObject("/LinkName"),
tmp["target_page_index"],
Fit(
fit_type=tmp["fit"], fit_args=dict(tmp)["fit_args"]
), # I have no clue why this dict-hack is necessary
)
to_add[NameObject("/Dest")] = dest.dest_array
Environment
$ python -m platform
Windows-10-10.0.19043-SP0
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.2, crypt_provider=('cryptography', '37.0.4'), PIL=9.4.0
Code + PDF
annotations = {}
writer = PdfWriter()
in_memory_file = BytesIO()
for filename in filenames:
reader = PdfReader(filename, strict=False)
for page_idx, page in enumerate(reader.pages):
if "/Annots" in page:
for annot in page["/Annots"]:
if not annotations.get(page_idx):
annotations[page_idx] = []
annotations[page_idx].append(annot.get_object())
del reader
reader = PdfReader(filenames[0])
for page_idx, page in enumerate(reader.pages):
writer.add_page(page)
del reader
writer.remove_links()
for page_idx in annotations:
for annot in annotations[page_idx]:
writer.add_annotation(page_number=page_idx, annotation=annot)
writer.write(in_memory_file)
Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "C:\NOSKOV\030_DEV\web_services\skotch3\src\backend\entrypoints\..\logic\service_layer\message_bus.py", line 537, in handle_command
result = handler(command, self._uow, self.handle)
File "C:\NOSKOV\030_DEV\web_services\skotch3\src\backend\entrypoints\..\logic\service_layer\command_handlers\command_service_handlers.py", line 929, in mix_pdf_files
writer.add_annotation(page_number=page_idx, annotation=annot)
File "C:\NOSKOV\030_DEV\web_services\skotch3\src\backend\venv\lib\site-packages\pypdf\_writer.py", line 2803, in add_annotation
tmp["target_page_index"],
TypeError: list indices must be integers or slices, not str