pdfminer dumppdf.py -T throws an exception: "NameError: name 's' is not defined"

dumppdf.py -T throws an exception: "NameError: name 's' is not defined"

Open jeffstearns opened this issue 5 years ago • 7 comments

dumppdf.py throws an exception when the -T option is used.

pdfminer version 20191125 running with Python 3.7.6 on OS X 10.13.6

To reproduce:

Create a valid pdf.
Run dumppdf.py -T on this pdf.

% dumppdf.py -a -T ~/tmp/foo.pdf
<outlines>
Traceback (most recent call last):
  File "/usr/local/bin/dumppdf.py", line 272, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "/usr/local/bin/dumppdf.py", line 269, in main
    dumpall=dumpall, mode=mode, extractdir=extractdir)
  File "/usr/local/bin/dumppdf.py", line 151, in dumpoutline
    outfp.write('<outline level="%r" title="%s">\n' % (level, q(s)))
NameError: name 's' is not defined

Jan 12 '20 19:01 jeffstearns

I'm not getting this error using the samples/simple1.pdf file. Could you share the pdf file you are using?

Jan 13 '20 21:01 pietermarsman

Here is a file that causes the problem.

I have many others that also fail. This file (and the others) were processed by ABBY FineReader OCR software. That might be relevant to this bug.

dmv.pdf

Jan 15 '20 04:01 jeffstearns

I had the same problem as @jeffstearns I changed line 151 on dumppdf.py to:

                outfp.write('<outline level="%r" title="%s">\n' % (level, q(title)))

and got rid of the error message.

Jul 21 '20 22:07 igavronski

After solving the first error message with @igavronski 's workaround, I get another one:

File "dumppdf.py", line 143, in dumpoutline
    pageno = pages[dest[0].objid]
TypeError: 'PDFObjRef' object is not subscriptable

Jan 29 '21 15:01 jrkager

Same issue as @jrkager

Jan 29 '21 16:01 ghost

@jrkager @entelven :: same PDF (dmv.pdf) or another one? Could you please inform the Python version and environment?

Jan 29 '21 19:01 igavronski

@jrkager @entelven :: same PDF (dmv.pdf) or another one? Could you please inform the Python version and environment?

Another PDF. Python 3.9.1 in conda env.

However, the same command on the same PDF worked with pdfminer.six and I got what I wanted.

Jan 30 '21 22:01 jrkager

pdfminer pdfminer copied to clipboard

dumppdf.py -T throws an exception: "NameError: name 's' is not defined"

pdfminer
pdfminer copied to clipboard