Performance regression getting masked data in 1.4.1+ when boto3 is installed
Expected behavior and actual behavior.
Getting masked data using rasterio.mask.mask should perform approximately as fast in 1.4.1+ as in 1.4.0. In practice, with boto3 installed, 1.4.1 performs at about 20% of the speed of 1.4.0.
Steps to reproduce the problem.
- Install rasterio 1.4.1+
- Install boto3
- Set environment variables
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY - Perform
rasterio.mask.maskon a large number of features - Because of this line: https://github.com/rasterio/rasterio/blob/7f8bda2e32df5b5eefc6e1ea6b5f04a946567ede/rasterio/_io.pyx#L2256, introduced in 1.4.1 as far as I can tell, each call to mask will create a new
Env, which in turn will create a new boto3Sessionif boto3 is installed andAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYare set.
From my tests, running a large number of mask operations (500k+) for "small" features, the majority of time is now spent creating AWS sessions instead actual masking. Note that the actual data I'm masking is on a local disk, not on AWS, but I need the AWS configuration in other parts of the same app. In my app this reduces throughput from ~300 features/second to 60 features/second, so the performance drop is pretty dramatic.
Partial stack trace that illustrates the relevant part where this happens:
File "/usr/local/lib/python3.10/dist-packages/rasterio/mask.py", line 180, in mask
shape_mask, transform, window = raster_geometry_mask(
File "/usr/local/lib/python3.10/dist-packages/rasterio/mask.py", line 106, in raster_geometry_mask
mask = geometry_mask(shapes, transform=transform, invert=invert,
File "/usr/local/lib/python3.10/dist-packages/rasterio/env.py", line 411, in wrapper
return f(*args, **kwds)
File "/usr/local/lib/python3.10/dist-packages/rasterio/features.py", line 69, in geometry_mask
return rasterize(
File "/usr/local/lib/python3.10/dist-packages/rasterio/env.py", line 411, in wrapper
return f(*args, **kwds)
File "/usr/local/lib/python3.10/dist-packages/rasterio/features.py", line 392, in rasterize
_rasterize(valid_shapes, out, transform, all_touched, merge_alg)
File "rasterio/_features.pyx", line 384, in rasterio._features._rasterize
File "rasterio/_io.pyx", line 2254, in rasterio._io.MemoryDataset.__init__
File "rasterio/_io.pyx", line 2256, in rasterio._io.MemoryDataset.__init__
File "/usr/local/lib/python3.10/dist-packages/rasterio/env.py", line 206, in __init__
self.session = Session.from_environ()
File "/usr/local/lib/python3.10/dist-packages/rasterio/session.py", line 191, in from_environ
session = Session.aws_or_dummy(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/rasterio/session.py", line 168, in aws_or_dummy
return AWSSession(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/rasterio/session.py", line 283, in __init__
self._session = boto3.Session(
File "/usr/local/lib/python3.10/dist-packages/boto3/session.py", line 108, in __init__
self._setup_loader()
Environment Information
rasterio info:
rasterio: 1.4.0
GDAL: 3.9.2
PROJ: 9.4.1
GEOS: 3.11.1
PROJ DATA: /usr/local/lib/python3.10/dist-packages/rasterio/proj_data
GDAL DATA: /usr/local/lib/python3.10/dist-packages/rasterio/gdal_data
System:
python: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
executable: /usr/bin/python3
machine: Linux-6.8.0-57-generic-x86_64-with-glibc2.35
Python deps:
affine: 2.4.0
attrs: 25.3.0
certifi: 2025.01.31
click: 8.1.8
cligj: 0.7.2
cython: None
numpy: 1.26.4
click-plugins: None
setuptools: 59.6.0
Sorry, obviously copied Environment information from the wrong container, here's the info where the I actually see the problem:
rasterio info:
rasterio: 1.4.1
GDAL: 3.9.2
PROJ: 9.4.1
GEOS: 3.11.1
PROJ DATA: /usr/local/lib/python3.10/dist-packages/rasterio/proj_data
GDAL DATA: /usr/local/lib/python3.10/dist-packages/rasterio/gdal_data
System:
python: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
executable: /usr/bin/python3
machine: Linux-6.8.0-57-generic-x86_64-with-glibc2.35
Python deps:
affine: 2.4.0
attrs: 25.3.0
certifi: 2025.01.31
click: 8.1.8
cligj: 0.7.2
cython: None
numpy: 1.26.4
click-plugins: None
setuptools: 59.6.0
Thanks for the report @perliedman !