OSError: broken data stream when using OpenJPEG 2.5.2
pillow 10.3.0 and 10.4.0 installed from pip fails to open jpeg from this zip archive pillow 10.2.0 works fine
pillow 10.3.0 and 10.4.0 installed from git also work fine
What did you do?
open jpeg from zip archive
What did you expect to happen?
jpeg is loaded successfully
What actually happened?
pillow 10.3.0 and 10.4.0 installed from pip raises OSError
steps to reproduce
this command
docker run --rm -i -v ./selfie-min.jpeg.zip:/selfie-min.jpeg.zip python /bin/bash -ex <<EOF
unzip /selfie-min.jpeg.zip
pip install pillow==10.3.0
python -m PIL.report
python -c 'from PIL import Image; Image.open("selfie-min.jpeg").load()'
EOF
raises
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.12/site-packages/PIL/Jpeg2KImagePlugin.py", line 313, in load
return ImageFile.ImageFile.load(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/PIL/ImageFile.py", line 310, in load
raise _get_oserror(err_code, encoder=False)
OSError: broken data stream when reading image file
while this code
docker run --rm -i -v ./selfie-min.jpeg.zip:/selfie-min.jpeg.zip python /bin/bash <<EOF
set -ex
unzip /selfie-min.jpeg.zip
pip install git+https://github.com/python-pillow/[email protected]
python -m PIL.report
python -c 'from PIL import Image; Image.open("selfie-min.jpeg").load()'
echo image loading successed
EOF
works fine
What are your OS, Python and Pillow versions?
compilled from source
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 10.3.0
--- TKINTER support ok, loaded 8.6
--- FREETYPE2 support ok, loaded 2.12.1
--- LITTLECMS2 support ok, loaded 2.14
--- WEBP support ok, loaded 1.2.4
--- WEBP Transparency support ok
--- WEBPMUX support ok
--- WEBP Animation support ok
--- JPEG support ok, compiled for libjpeg-turbo 2.1.5
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.0
--- ZLIB (PNG/ZIP) support ok, loaded 1.2.13
--- LIBTIFF support ok, loaded 4.5.0
*** RAQM (Bidirectional Text) support not installed
*** LIBIMAGEQUANT (Quantization method) support not installed
--- XCB (X protocol) support ok
--------------------------------------------------------------------
installed from pip
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 10.3.0
--- TKINTER support ok, loaded 8.6
--- FREETYPE2 support ok, loaded 2.13.2
--- LITTLECMS2 support ok, loaded 2.16
--- WEBP support ok, loaded 1.3.2
--- WEBP Transparency support ok
--- WEBPMUX support ok
--- WEBP Animation support ok
--- JPEG support ok, compiled for libjpeg-turbo 3.0.2
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.2
--- ZLIB (PNG/ZIP) support ok, loaded 1.2.13
--- LIBTIFF support ok, loaded 4.6.0
--- RAQM (Bidirectional Text) support ok, loaded 0.10.1, fribidi 1.0.8, harfbuzz 8.4.0
*** LIBIMAGEQUANT (Quantization method) support not installed
--- XCB (X protocol) support ok
--------------------------------------------------------------------
I think if you use LOAD_TRUNCATED_IMAGES, you will be able to load the image in all of your environments.
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
Image.open("selfie-min.jpeg").load()
I think if you use
LOAD_TRUNCATED_IMAGES, you will be able to load the image in all of your environments.
https://github.com/python-pillow/Pillow/blob/d6cfebd016db25549d05a9c5caa2ef3b53cff5c5/src/PIL/ImageFile.py#L307
LOAD_TRUNCATED_IMAGES just swallows the error. The image appears to be blank but it is not
docker run --rm -i -v ./selfie-min.jpeg.zip:/selfie-min.jpeg.zip python /bin/bash <<EOF
unzip /selfie-min.jpeg.zip
pip install pillow==10.3.0 numpy
python -m PIL.report
python -c 'import numpy as np; from PIL import Image, ImageFile; ImageFile.LOAD_TRUNCATED_IMAGES = True; img = Image.open("selfie-min.jpeg"); img.load(); print(np.array(img).std())'
EOF
prints 0.0
docker run --rm -i -v ./selfie-min.jpeg.zip:/selfie-min.jpeg.zip python /bin/bash <<EOF
set -ex
unzip /selfie-min.jpeg.zip
pip install git+https://github.com/python-pillow/[email protected]
pip install numpy
python -m PIL.report
python -c 'import numpy as np; from PIL import Image, ImageFile; ImageFile.LOAD_TRUNCATED_IMAGES = True; img = Image.open("selfie-min.jpeg"); img.load(); print(np.array(img).std())'
EOF
prints 93.807993182709
it seems to be jpeg2000 issue https://github.com/uclouvain/openjpeg/commit/0f528e95788863608aa1772f5370659edf618793 git bisect between v2.5.0 and v2.5.2 gives this commit
That might not actually be a openjpeg bug, but an unintended interaction with our heuristic for determining the color space: https://github.com/python-pillow/Pillow/blob/d6cfebd016db25549d05a9c5caa2ef3b53cff5c5/src/libImaging/Jpeg2KDecode.c#L708-L784
The color space was previously loaded only after our heuristic, whereas the commit you linked moves it to before, so we might need to adjust it. However, I would need to check it with a debugger to be sure that is the issue.
Is the image that you've provided one that could be added to our test suite, and distributed under the Pillow license?
Most likely yes. I have asked teammates to be sure. I will tell you when the situation with copyright becomes clear
If we stop throwing an error when the color space is unknown, and treat it like we treat an unspecified color space instead, the image loads correctly.
diff --git a/src/libImaging/Jpeg2KDecode.c b/src/libImaging/Jpeg2KDecode.c
index 5b3d7ffc4..a81a14a09 100644
--- a/src/libImaging/Jpeg2KDecode.c
+++ b/src/libImaging/Jpeg2KDecode.c
@@ -698,8 +698,7 @@ j2k_decode_entry(Imaging im, ImagingCodecState state) {
}
/* Check that this image is something we can handle */
- if (image->numcomps < 1 || image->numcomps > 4 ||
- image->color_space == OPJ_CLRSPC_UNKNOWN) {
+ if (image->numcomps < 1 || image->numcomps > 4) {
state->errcode = IMAGING_CODEC_BROKEN;
state->state = J2K_STATE_FAILED;
goto quick_exit;
@@ -744,7 +743,7 @@ j2k_decode_entry(Imaging im, ImagingCodecState state) {
/* Find the correct unpacker */
color_space = image->color_space;
- if (color_space == OPJ_CLRSPC_UNSPECIFIED) {
+ if (color_space == OPJ_CLRSPC_UNKNOWN || color_space == OPJ_CLRSPC_UNSPECIFIED) {
switch (image->numcomps) {
case 1:
case 2:
I've created https://github.com/python-pillow/Pillow/pull/8343 to resolve this.