pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

Resolve indirect objects when reading metadata

Open MartinThoma opened this issue 1 year ago • 1 comments

Explanation

The pattern writer.add_metadata(reader.metadata) doesn't work as there can be indirect objects. This is unfortunate and an unnecessary complexity for the user.

Code Example

from PyPDF2 import PdfReader, PdfWriter

reader = PdfReader("example.pdf")
writer = PdfWriter()

# How I want to use it
writer.add_metadata(reader.metadata)

MartinThoma avatar Jul 30 '22 05:07 MartinThoma

I imagine something like this on top of the existing code:

def get_meta(reader: PdfReader) -> Dict[str, Any]:
    meta_new = {}
    for key, value in reader.metadata.items():
        i = 0  # I'm not sure if there could be infinite loops
        while isinstance(value, IndirectObject) and i < 3:
            value = reader.metadata[key].get_object()
            i += 1
        if not isinstance(value, IndirectObject):
            meta_new[key] = value
    return meta_new

MartinThoma avatar Jul 30 '22 05:07 MartinThoma