pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

Unspecific type hints for reader.metadata

Open adamchainz opened this issue 1 year ago • 0 comments

Take the below example file:

from PyPDF2 import PdfReader

with open("example.pdf", "rb") as fp:
    reader = PdfReader(fp)
    metadata = reader.metadata
    assert metadata is not None
    date_str = metadata["/CreationDate"]
    date_str = date_str.removeprefix("D:").replace("'", "")
    print(date_str)

It runs fine:

$ python example.py
20220415093243+0200

but Mypy complains about using remove_prefix() on date_str:

$ mypy example.py
example.py:8: error: "PdfObject" has no attribute "removeprefix"  [attr-defined]
Found 1 error in 1 file (checked 1 source file)

This is due to DocumentInformation being a subclass of DictionaryObject, and thus only guaranteeing that the values returned are PdfObjects. In practice they seem to only be TextStringObjects, which subclass str. If they're always TextStringObjects, the types in DocumentInformation should be adjusted accordingly.

Environment

$ python -m platform
macOS-12.5-arm64-arm-64bit

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.9.0

Code + PDF

above, used metadata.pdf from PyPDF2 resources

adamchainz avatar Aug 11 '22 10:08 adamchainz