PyPDF4 icon indicating copy to clipboard operation
PyPDF4 copied to clipboard

pdf.getDocumentInfo().title sometimes None

Open clach04 opened this issue 5 years ago • 1 comments

Just found this fork/project after logging https://github.com/mstamy2/PyPDF3/issues/13 test case below is for PyPDF4.

I've seen a number of PDF files where the title attribute/property is reported as None but when then accessing /Title there is content. I've no idea if this is a problem with the pdf(s) or with PyPDF. There is a workaround (which may be an indication of a potential change to PyPDF but I'm unclear of what the correct thing to do here is)

Attached PDF title_bug.pdf is about 5Mb and is a sample of a document that exhibits this behavior, I did not create it (nor do I know how it was created) so the only information we have is the meta data inside.

Test case, along with workaround below:

#!/usr/bin/env python
# -*- coding: windows-1252 -*-
# vim:ts=4:sw=4:softtabstop=4:smarttab:expandtab
#

import os
import sys

ver_to_test = 2
ver_to_test = 3
ver_to_test = 4

if ver_to_test == 4:
    from pypdf import PdfFileReader  # https://github.com/claird/PyPDF4
elif ver_to_test == 3:
    from PyPDF3 import PdfFileReader  # https://github.com/mstamy2/PyPDF3
else:
    from PyPDF2 import PdfFileReader  # https://github.com/mstamy2/PyPDF2 / https://pythonhosted.org/PyPDF2/


print('Python %s on %s' % (sys.version, sys.platform))

filename = 'title_bug.pdf'
f = open(filename, 'rb')
pdf = PdfFileReader(f)
info = pdf.documentInfo
#print(info)
print('title attribute %r' % info.title)  # reports None
print('title getText() %r' % info.getText("/Title"))  # this is what .title property calls
print('title get() %r' % info.get("/Title"))  # this is part of what dict[] does
print('title get().getObject() %r' % info.get("/Title").getObject())  # this is what dict[] does
print('/Title dict entry %r' % info['/Title'])  # with test pdf works
print('title attribute %r' % info.title)  # Sanity check it is still None
print('title Workaround %r' % (info.title or info['/Title']))  # Workaround
f.close()

clach04 avatar Aug 09 '19 03:08 clach04

fixed upstream

clach04 avatar Apr 18 '22 15:04 clach04