pypdf
pypdf copied to clipboard
Resolve indirect objects when reading metadata
Explanation
The pattern writer.add_metadata(reader.metadata)
doesn't work as there can be indirect objects.
This is unfortunate and an unnecessary complexity for the user.
Code Example
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader("example.pdf")
writer = PdfWriter()
# How I want to use it
writer.add_metadata(reader.metadata)
I imagine something like this on top of the existing code:
def get_meta(reader: PdfReader) -> Dict[str, Any]:
meta_new = {}
for key, value in reader.metadata.items():
i = 0 # I'm not sure if there could be infinite loops
while isinstance(value, IndirectObject) and i < 3:
value = reader.metadata[key].get_object()
i += 1
if not isinstance(value, IndirectObject):
meta_new[key] = value
return meta_new