ray
ray copied to clipboard
[Data] Allow configuration of MAX_IMAGE_PIXELS in ImageDatasource
Why are these changes needed?
PIL.Image sets a default limit to image sizes to prevent decompression bomb DOS attacks. Example stack trace:
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/data/datasource/image_datasource.py", line 81, in _read_stream
image = Image.open(io.BytesIO(data))
File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3133, in open
im = _open_core(fp, filename, prefix, formats)
File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3120, in _open_core
_decompression_bomb_check(im.size)
File "/home/ray/anaconda3/lib/python3.8/site-packages/PIL/Image.py", line 3029, in _decompression_bomb_check
raise DecompressionBombError(
PIL.Image.DecompressionBombError: Image size (222926661 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
However, the limit should be configurable when processing images with a trusted dataset and users should be able to configure the limit when calling ray.data.read_images
.
Related issue number
Checks
- [X] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/
under the corresponding.rst
file.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
i rebased the pr to the latest of master to avoid some bugs in ci, my apologies if this interrupt your work in any ways, thankks
I tried to add a unit test but I don't think they exercise reading actual images. I manually ran this script to test the changes:
import ray
DATA_URI = "gs://anonymous@ray-images/images"
def main():
ray.init()
dataset = ray.data.read_images(DATA_URI, include_paths=True, max_image_pixels=4)
dataset_iter = dataset.iter_batches(batch_size=None)
for _ in dataset_iter:
pass
if __name__ == "__main__":
main()
Exception raised:
File "/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.11/site-packages/PIL/Image.py", line 3120, in _open_core
_decompression_bomb_check(im.size)
File "/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.11/site-packages/PIL/Image.py", line 3029, in _decompression_bomb_check
raise DecompressionBombError(
PIL.Image.DecompressionBombError: Image size (47160 pixels) exceeds limit of 8 pixels, could be decompression bomb DOS attack.