pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

'not enough image data' exception from PIL

Open brianpow opened this issue 6 months ago • 0 comments

I am trying to extract images from pdf files, however occasionally it gives 'not enough image data' exception from PIL when handling certain pdf. The files look correct in Atril Document Viewer and works if using pdfimages from poppler-utils

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.5.0-kali3-amd64-x86_64-with-glibc2.37

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.2, crypt_provider=('cryptography', '38.0.4'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader
import sys

for filename in sys.argv[1:]:
    reader = PdfReader(filename)
    for i, page in enumerate(reader.pages):
        for j, image in enumerate(page.images):
            print("Writing %d-%d: %s (%d)..." % (i, j, image.name, len(image.data)))            
            with open(image.name, "wb") as fp:
                fp.write(image.data)

Share here the PDF file(s) that cause the issue. The smaller they are, the better. Let us know if we may add them to our tests!

test2_P038-038.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/user/pypdf/pypdf_test.py", line 7, in <module>
    for j, image in enumerate(page.images):
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2727, in __iter__
    yield self[i]
          ~~~~^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2723, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 557, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/filters.py", line 785, in _xobj_to_image
    img, image_format, extension, _ = _handle_flate(
                                      ^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/pypdf/_xobj_image_helpers.py", line 172, in _handle_flate
    img = Image.frombytes(mode, size, data)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2952, in frombytes
    im.frombytes(data, decoder_name, args)
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 805, in frombytes
    raise ValueError(msg)
ValueError: not enough image data

brianpow avatar Dec 15 '23 00:12 brianpow