ray icon indicating copy to clipboard operation
ray copied to clipboard

[Data] Allow configuration of MAX_IMAGE_PIXELS in ImageDatasource

Open andrewsykim opened this issue 9 months ago • 2 comments

Why are these changes needed?

PIL.Image sets a default limit to image sizes to prevent decompression bomb DOS attacks. Example stack trace:

  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/image_datasource.py", line 81, in _read_stream
    image = Image.open(io.BytesIO(data))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3133, in open
    im = _open_core(fp, filename, prefix, formats)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3120, in _open_core
    _decompression_bomb_check(im.size)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3029, in _decompression_bomb_check
    raise DecompressionBombError(
PIL.Image.DecompressionBombError: Image size (222926661 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.

However, the limit should be configurable when processing images with a trusted dataset and users should be able to configure the limit when calling ray.data.read_images.

Related issue number

Checks

  • [X] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

andrewsykim avatar May 17 '24 14:05 andrewsykim

i rebased the pr to the latest of master to avoid some bugs in ci, my apologies if this interrupt your work in any ways, thankks

can-anyscale avatar May 17 '24 17:05 can-anyscale

I tried to add a unit test but I don't think they exercise reading actual images. I manually ran this script to test the changes:

import ray

DATA_URI = "gs://anonymous@ray-images/images"

def main():
    ray.init()
    dataset = ray.data.read_images(DATA_URI, include_paths=True, max_image_pixels=4)

    dataset_iter = dataset.iter_batches(batch_size=None)
    for _ in dataset_iter:
        pass

if __name__ == "__main__":
    main()

Exception raised:

  File "/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.11/site-packages/PIL/Image.py", line 3120, in _open_core
    _decompression_bomb_check(im.size)
  File "/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.11/site-packages/PIL/Image.py", line 3029, in _decompression_bomb_check
    raise DecompressionBombError(
PIL.Image.DecompressionBombError: Image size (47160 pixels) exceeds limit of 8 pixels, could be decompression bomb DOS attack.

andrewsykim avatar May 19 '24 02:05 andrewsykim