pyPdf icon indicating copy to clipboard operation
pyPdf copied to clipboard

Infinite loop on empty input

Open jsonn opened this issue 14 years ago • 6 comments

Create an empty StringIO and call the pdf reader on it. It will loop in the readNextEndLine calls before the %%EOF check in read.

jsonn avatar Jan 16 '11 22:01 jsonn

It enters infinite loop for single-line text files and some other files too.

tongwang avatar May 04 '12 20:05 tongwang

got this bug too !

alexgarel avatar Jun 18 '12 09:06 alexgarel

Proposed patch

diff --git a/pyPdf/pdf.py b/pyPdf/pdf.py
index bf60d01..586ea81 100644
--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
         # start at the end:
         stream.seek(-1, 2)
         line = ''
-        while not line:
+        while not line and stream.tell():
             line = self.readNextEndLine(stream)
         if line[:5] != "%%EOF":
             raise utils.PdfReadError, "EOF marker not found"

Without patch::

    >>> import pyPdf
    >>> from cStringIO import StringIO
    >>> c = StringIO('')
    >>> pdf = pyPdf.PdfFileReader(c)
    --- Infinite loop ---
    ^CTraceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
        self.read(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 705, in read
        line = self.readNextEndLine(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 870, in readNextEndLine
        line = x + line
    KeyboardInterrupt

With patch::

    >>> import pyPdf
    >>> from cStringIO import StringIO
    >>> c = StringIO('')
    >>> pdf = pyPdf.PdfFileReader(c)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
        self.read(stream)
      File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
        raise utils.PdfReadError, "EOF marker not found"
    pyPdf.utils.PdfReadError: EOF marker not found

alexgarel avatar Jun 18 '12 10:06 alexgarel

Hum a better patch:

--- a/pyPdf/pdf.py
+++ b/pyPdf/pdf.py
@@ -701,7 +701,7 @@ class PdfFileReader(object):
         # start at the end:
         stream.seek(-1, 2)
         line = ''
-        while not line:
+        while not line and stream.tell():
             line = self.readNextEndLine(stream)
         if line[:5] != "%%EOF":
             raise utils.PdfReadError, "EOF marker not found"
@@ -857,7 +857,7 @@ class PdfFileReader(object):

     def readNextEndLine(self, stream):
         line = ""
-        while True:
+        while stream.tell():
             x = stream.read(1)
             stream.seek(-2, 1)
             if x == '\n' or x == '\r':

This one work with empty stream but also one line stream:

>>> import pyPdf
>>> from cStringIO import StringIO
>>> c = StringIO('  ')
>>> pdf = pyPdf.PdfFileReader(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 374, in __init__
    self.read(stream)
  File "/tmp/pyPdf2/lib/python2.7/site-packages/pyPdf/pdf.py", line 707, in read
    raise utils.PdfReadError, "EOF marker not found"
pyPdf.utils.PdfReadError: EOF marker not found

alexgarel avatar Jun 18 '12 15:06 alexgarel

The second chunk is not really going to work...

jsonn avatar Jun 18 '12 15:06 jsonn

sorry, corrected :-)

alexgarel avatar Jun 18 '12 16:06 alexgarel