pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

"OSError: encoder error -2 when writing image file" while enumerating images

Open michelcrypt4d4mus opened this issue 8 months ago • 3 comments

Exception while enumerating images.

This seems to be a regression - when I was including 3.14.0 in clown_sort i rarely if ever had issues enumerating pages. Now I have them in a large % of PDFs.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
3.11.5

$ python -c "import pypdf;print(pypdf._debug_versions)"
3.16.4

Code + PDF

See exception text. PDF attached; you can add it to your tests.

Traceback

  ➤ OSError: encoder error -2 when writing image file while parsing embedded image 1 on page 3...
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:515 in _save                                                   │
│                                                                                                  │
│   512 │   # a tricky case.                                                                       │
│   513 │   bufsize = max(MAXBLOCK, bufsize, im.size[0] * 4)  # see RawEncode.c                    │
│   514 │   try:                                                                                   │
│ ❱ 515 │   │   fh = fp.fileno()                                                                   │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnsupportedOperation: fileno

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/filters.py:872 in _xobj_to_image                                          │
│                                                                                                  │
│   869 │                                                                                          │
│   870 │   img_byte_arr = BytesIO()                                                               │
│   871 │   try:                                                                                   │
│ ❱ 872 │   │   img.save(img_byte_arr, format=image_format)                                        │
│   873 │   except OSError:  # pragma: no cover                                                    │
│   874 │   │   # odd error                                                                        │
│   875 │   │   img_byte_arr = BytesIO()                                                           │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Image.py:2438 in save                                                       │
│                                                                                                  │
│   2435 │   │   │   │   fp = builtins.open(filename, "w+b")                                       │
│   2436 │   │                                                                                     │
│   2437 │   │   try:                                                                              │
│ ❱ 2438 │   │   │   save_handler(self, fp, filename)                                              │
│   2439 │   │   except Exception:                                                                 │
│   2440 │   │   │   if open_fp:                                                                   │
│   2441 │   │   │   │   fp.close()                                                                │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Jpeg2KImagePlugin.py:385 in _save                                           │
│                                                                                                  │
│   382 │   │   plt,                                                                               │
│   383 │   )                                                                                      │
│   384 │                                                                                          │
│ ❱ 385 │   ImageFile._save(im, fp, [("jpeg2k", (0, 0) + im.size, 0, kind)])                       │
│   386                                                                                            │
│   387                                                                                            │
│   388 # ------------------------------------------------------------                             │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:519 in _save                                                   │
│                                                                                                  │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
│ ❱ 519 │   │   _encode_tile(im, fp, tile, bufsize, None, exc)                                     │
│   520 │   if hasattr(fp, "flush"):                                                               │
│   521 │   │   fp.flush()                                                                         │
│   522                                                                                            │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:547 in _encode_tile                                            │
│                                                                                                  │
│   544 │   │   │   │   │   errcode = encoder.encode_to_file(fh, bufsize)                          │
│   545 │   │   │   if errcode < 0:                                                                │
│   546 │   │   │   │   msg = f"encoder error {errcode} when writing image file"                   │
│ ❱ 547 │   │   │   │   raise OSError(msg) from exc                                                │
│   548 │   │   finally:                                                                           │
│   549 │   │   │   encoder.cleanup()                                                              │
│   550                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: encoder error -2 when writing image file

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:515 in _save                                                   │
│                                                                                                  │
│   512 │   # a tricky case.                                                                       │
│   513 │   bufsize = max(MAXBLOCK, bufsize, im.size[0] * 4)  # see RawEncode.c                    │
│   514 │   try:                                                                                   │
│ ❱ 515 │   │   fh = fp.fileno()                                                                   │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnsupportedOperation: fileno

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/workspace/clown_sort/clown_sort/files/pdf_file.py:61 in extracted_text      │
│                                                                                                  │
│    58 │   │   │   │                                                                              │
│    59 │   │   │   │   # Extracting images is a bit fraught (lots of PIL and pypdf exceptions h   │
│    60 │   │   │   │   try:                                                                       │
│ ❱  61 │   │   │   │   │   for image_number, image in enumerate(page.images, start=1):            │
│    62 │   │   │   │   │   │   image_name = f"Page {page_number}, Image {image_number}"           │
│    63 │   │   │   │   │   │   self._log_to_stderr(f"   Processing {image_name}...")              │
│    64 │   │   │   │   │   │   page_buffer.print(Panel(image_name, expand=False))                 │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/_page.py:2722 in __iter__                                                 │
│                                                                                                  │
│   2719 │                                                                                         │
│   2720 │   def __iter__(self) -> Iterator[ImageFile]:                                            │
│   2721 │   │   for i in range(len(self)):                                                        │
│ ❱ 2722 │   │   │   yield self[i]                                                                 │
│   2723 │                                                                                         │
│   2724 │   def __str__(self) -> str:                                                             │
│   2725 │   │   p = [f"Image_{i}={n}" for i, n in enumerate(self.ids_function())]                 │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/_page.py:2718 in __getitem__                                              │
│                                                                                                  │
│   2715 │   │   │   index = len_self + index                                                      │
│   2716 │   │   if index < 0 or index >= len_self:                                                │
│   2717 │   │   │   raise IndexError("sequence index out of range")                               │
│ ❱ 2718 │   │   return self.get_function(lst[index])                                              │
│   2719 │                                                                                         │
│   2720 │   def __iter__(self) -> Iterator[ImageFile]:                                            │
│   2721 │   │   for i in range(len(self)):                                                        │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/_page.py:547 in _get_image                                                │
│                                                                                                  │
│    544 │   │   │   │   │   raise KeyError("no inline image can be found")                        │
│    545 │   │   │   │   return self.inline_images[id]                                             │
│    546 │   │   │                                                                                 │
│ ❱  547 │   │   │   imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))                      │
│    548 │   │   │   extension, byte_stream = imgd[:2]                                             │
│    549 │   │   │   f = ImageFile(                                                                │
│    550 │   │   │   │   name=f"{id[1:]}{extension}",                                              │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/filters.py:876 in _xobj_to_image                                          │
│                                                                                                  │
│   873 │   except OSError:  # pragma: no cover                                                    │
│   874 │   │   # odd error                                                                        │
│   875 │   │   img_byte_arr = BytesIO()                                                           │
│ ❱ 876 │   │   img.save(img_byte_arr, format=image_format)                                        │
│   877 │   data = img_byte_arr.getvalue()                                                         │
│   878 │                                                                                          │
│   879 │   try:  # temporary try/except until other fixes of images                               │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Image.py:2438 in save                                                       │
│                                                                                                  │
│   2435 │   │   │   │   fp = builtins.open(filename, "w+b")                                       │
│   2436 │   │                                                                                     │
│   2437 │   │   try:                                                                              │
│ ❱ 2438 │   │   │   save_handler(self, fp, filename)                                              │
│   2439 │   │   except Exception:                                                                 │
│   2440 │   │   │   if open_fp:                                                                   │
│   2441 │   │   │   │   fp.close()                                                                │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Jpeg2KImagePlugin.py:385 in _save                                           │
│                                                                                                  │
│   382 │   │   plt,                                                                               │
│   383 │   )                                                                                      │
│   384 │                                                                                          │
│ ❱ 385 │   ImageFile._save(im, fp, [("jpeg2k", (0, 0) + im.size, 0, kind)])                       │
│   386                                                                                            │
│   387                                                                                            │
│   388 # ------------------------------------------------------------                             │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:519 in _save                                                   │
│                                                                                                  │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
│ ❱ 519 │   │   _encode_tile(im, fp, tile, bufsize, None, exc)                                     │
│   520 │   if hasattr(fp, "flush"):                                                               │
│   521 │   │   fp.flush()                                                                         │
│   522                                                                                            │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:547 in _encode_tile                                            │
│                                                                                                  │
│   544 │   │   │   │   │   errcode = encoder.encode_to_file(fh, bufsize)                          │
│   545 │   │   │   if errcode < 0:                                                                │
│   546 │   │   │   │   msg = f"encoder error {errcode} when writing image file"                   │
│ ❱ 547 │   │   │   │   raise OSError(msg) f
[Binance discovery responses 2 gov.uscourts.dcd.256060.140.1.pdf](https://github.com/py-pdf/pypdf/files/13126365/Binance.discovery.responses.2.gov.uscourts.dcd.256060.140.1.pdf)
rom exc                                                │
│   548 │   │   finally:                                                                           │
│   549 │   │   │   encoder.cleanup()                                                              │
│   550                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

michelcrypt4d4mus avatar Oct 24 '23 19:10 michelcrypt4d4mus

@michelcrypt4d4mus please provide pdf file with the issue and a clear simple code to evaluation : reading the stack is awfull

pubpub-zz avatar Oct 24 '23 19:10 pubpub-zz

this is the line of code that is causing the error

and sorry i thought i had attached the file but somehow it did not attach... trying again Binance discovery responses 2 gov.uscourts.dcd.256060.140.1.pdf

michelcrypt4d4mus avatar Oct 24 '23 20:10 michelcrypt4d4mus

Like #2266 this issue seems to go away when i downgrade to PyPDF 3.14.0

michelcrypt4d4mus avatar Oct 24 '23 20:10 michelcrypt4d4mus

@michelcrypt4d4mus can you give confirm the issue is still present with latest release. Can you also provide a simple code

pubpub-zz avatar Apr 08 '24 20:04 pubpub-zz

It seems like this still raises the same issue:

>>> from pypdf import PdfReader
>>> reader = PdfReader('../Binance.discovery.responses.2.gov.uscourts.dcd.256060.140.1.pdf')
>>> for page in reader.pages:
...   print(page)
...   for image in page.images:
...     print(image)
...     print(image.image)
... 

stefan6419846 avatar Apr 09 '24 05:04 stefan6419846

the error is linked due to pillow not handling all formats as requested (JPEG2000 with Palette encoding). the worst is that, depending on the version of pillow we may have some errors or not and the image is corrupted or not.😫😫

pubpub-zz avatar Apr 10 '24 20:04 pubpub-zz

as mentioned this was not an issue with PyPDF 3.14.0

michelcrypt4d4mus avatar Apr 11 '24 00:04 michelcrypt4d4mus