pypdf
pypdf copied to clipboard
'not enough image data' exception from PIL
I am trying to extract images from pdf files, however occasionally it gives 'not enough image data' exception from PIL when handling certain pdf. The files look correct in Atril Document Viewer and works if using pdfimages from poppler-utils
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-6.5.0-kali3-amd64-x86_64-with-glibc2.37
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.2, crypt_provider=('cryptography', '38.0.4'), PIL=10.0.0
Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader
import sys
for filename in sys.argv[1:]:
reader = PdfReader(filename)
for i, page in enumerate(reader.pages):
for j, image in enumerate(page.images):
print("Writing %d-%d: %s (%d)..." % (i, j, image.name, len(image.data)))
with open(image.name, "wb") as fp:
fp.write(image.data)
Share here the PDF file(s) that cause the issue. The smaller they are, the better. Let us know if we may add them to our tests!
Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "/home/user/pypdf/pypdf_test.py", line 7, in <module>
for j, image in enumerate(page.images):
File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2727, in __iter__
yield self[i]
~~~~^^^
File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 2723, in __getitem__
return self.get_function(lst[index])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/pypdf/_page.py", line 557, in _get_image
imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/pypdf/filters.py", line 785, in _xobj_to_image
img, image_format, extension, _ = _handle_flate(
^^^^^^^^^^^^^^
File "/home/user/.local/lib/python3.11/site-packages/pypdf/_xobj_image_helpers.py", line 172, in _handle_flate
img = Image.frombytes(mode, size, data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2952, in frombytes
im.frombytes(data, decoder_name, args)
File "/usr/lib/python3/dist-packages/PIL/Image.py", line 805, in frombytes
raise ValueError(msg)
ValueError: not enough image data