pypdf
pypdf copied to clipboard
Pillow 10.3.0 breaks test_filters.test_rgba
Running the tests with Pillow==10.3.0
breaks test_filters.test_rgba
. Pillow==10.2.0
works correctly.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('cryptography', '42.0.5'), PIL=10.3.0
Code + PDF
Just run pytest -k 'test_rgba'
.
Expected image:
Actual image:
Traceback
This is the complete traceback I see:
__________________________________ test_rgba ___________________________________
[gw3] linux -- Python 3.12.2 /opt/hostedtoolcache/Python/3.12.2/x64/bin/python
@pytest.mark.enable_socket()
def test_rgba():
"""Decode rgb with transparency"""
reader = PdfReader(BytesIO(get_data_from_url(name="tika-972174.pdf")))
data = reader.pages[0].images[0]
assert ".jp2" in data.name
similarity = image_similarity(
data.image, BytesIO(get_data_from_url(name="tika-972174_p0-im0.png"))
)
> assert similarity > 0.99
E assert 0.6877076861263712 > 0.99
tests/test_filters.py:380: AssertionError
There is an upstream fix available as a PR for the next Pillow release which fixes this.
This slightly breaks test_filters.test_rgba
and test_workflows.py.test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/972/972174.pdf-tika-972174.pdf]
, but this can be fixed by setting ImageFile.LOAD_TRUNCATED_IMAGES = True
for the scope of the corresponding test method.
I am not sure whether we should ban Pillow==10.3.0
from pypdf for now or whether we consider this an issue which does not occur too often and have no control over it anyway. From my perspective, I would probably not restrict this for now.
@stefan6419846 can you confirm that the transparency is correct?
@pubpub-zz The alpha masking is done in a separate step and looks correct.
This is the newly rendered image after applying the patch:
The file size differs slightly, but I could not see any real visual difference when comparing it to the reference image.