ray icon indicating copy to clipboard operation
ray copied to clipboard

[Data] Allow configuration of MAX_IMAGE_PIXELS in ImageDatasource

Open andrewsykim opened this issue 1 year ago • 2 comments

Why are these changes needed?

PIL.Image sets a default limit to image sizes to prevent decompression bomb DOS attacks. Example stack trace:

  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/image_datasource.py", line 81, in _read_stream
    image = Image.open(io.BytesIO(data))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3133, in open
    im = _open_core(fp, filename, prefix, formats)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3120, in _open_core
    _decompression_bomb_check(im.size)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3029, in _decompression_bomb_check
    raise DecompressionBombError(
PIL.Image.DecompressionBombError: Image size (222926661 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.

However, the limit should be configurable when processing images with a trusted dataset and users should be able to configure the limit when calling ray.data.read_images.

Related issue number

Checks

  • [X] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

andrewsykim avatar May 17 '24 14:05 andrewsykim

i rebased the pr to the latest of master to avoid some bugs in ci, my apologies if this interrupt your work in any ways, thankks

can-anyscale avatar May 17 '24 17:05 can-anyscale

I tried to add a unit test but I don't think they exercise reading actual images. I manually ran this script to test the changes:

import ray

DATA_URI = "gs://anonymous@ray-images/images"

def main():
    ray.init()
    dataset = ray.data.read_images(DATA_URI, include_paths=True, max_image_pixels=4)

    dataset_iter = dataset.iter_batches(batch_size=None)
    for _ in dataset_iter:
        pass

if __name__ == "__main__":
    main()

Exception raised:

  File "/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.11/site-packages/PIL/Image.py", line 3120, in _open_core
    _decompression_bomb_check(im.size)
  File "/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.11/site-packages/PIL/Image.py", line 3029, in _decompression_bomb_check
    raise DecompressionBombError(
PIL.Image.DecompressionBombError: Image size (47160 pixels) exceeds limit of 8 pixels, could be decompression bomb DOS attack.

andrewsykim avatar May 19 '24 02:05 andrewsykim

cc @c21 would you mind reviewing this PR? Thanks!

kevin85421 avatar May 21 '24 16:05 kevin85421

can you also add a test?

I tried to add a unit test but I couldn't get it passing because it didn't seem like the test actually reads images. Can you point me to an example unit test that is actually read images?

andrewsykim avatar May 30 '24 01:05 andrewsykim

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

stale[bot] avatar Feb 25 '25 04:02 stale[bot]