rasterio icon indicating copy to clipboard operation
rasterio copied to clipboard

Performance regression getting masked data in 1.4.1+ when boto3 is installed

Open perliedman opened this issue 11 months ago • 3 comments

Expected behavior and actual behavior.

Getting masked data using rasterio.mask.mask should perform approximately as fast in 1.4.1+ as in 1.4.0. In practice, with boto3 installed, 1.4.1 performs at about 20% of the speed of 1.4.0.

Steps to reproduce the problem.

  • Install rasterio 1.4.1+
  • Install boto3
  • Set environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • Perform rasterio.mask.mask on a large number of features
  • Because of this line: https://github.com/rasterio/rasterio/blob/7f8bda2e32df5b5eefc6e1ea6b5f04a946567ede/rasterio/_io.pyx#L2256, introduced in 1.4.1 as far as I can tell, each call to mask will create a new Env, which in turn will create a new boto3 Session if boto3 is installed and AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set.

From my tests, running a large number of mask operations (500k+) for "small" features, the majority of time is now spent creating AWS sessions instead actual masking. Note that the actual data I'm masking is on a local disk, not on AWS, but I need the AWS configuration in other parts of the same app. In my app this reduces throughput from ~300 features/second to 60 features/second, so the performance drop is pretty dramatic.

Partial stack trace that illustrates the relevant part where this happens:

  File "/usr/local/lib/python3.10/dist-packages/rasterio/mask.py", line 180, in mask
    shape_mask, transform, window = raster_geometry_mask(
  File "/usr/local/lib/python3.10/dist-packages/rasterio/mask.py", line 106, in raster_geometry_mask
    mask = geometry_mask(shapes, transform=transform, invert=invert,
  File "/usr/local/lib/python3.10/dist-packages/rasterio/env.py", line 411, in wrapper
    return f(*args, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/rasterio/features.py", line 69, in geometry_mask
    return rasterize(
  File "/usr/local/lib/python3.10/dist-packages/rasterio/env.py", line 411, in wrapper
    return f(*args, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/rasterio/features.py", line 392, in rasterize
    _rasterize(valid_shapes, out, transform, all_touched, merge_alg)
  File "rasterio/_features.pyx", line 384, in rasterio._features._rasterize
  File "rasterio/_io.pyx", line 2254, in rasterio._io.MemoryDataset.__init__
  File "rasterio/_io.pyx", line 2256, in rasterio._io.MemoryDataset.__init__
  File "/usr/local/lib/python3.10/dist-packages/rasterio/env.py", line 206, in __init__
    self.session = Session.from_environ()
  File "/usr/local/lib/python3.10/dist-packages/rasterio/session.py", line 191, in from_environ
    session = Session.aws_or_dummy(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rasterio/session.py", line 168, in aws_or_dummy
    return AWSSession(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/rasterio/session.py", line 283, in __init__
    self._session = boto3.Session(
  File "/usr/local/lib/python3.10/dist-packages/boto3/session.py", line 108, in __init__
    self._setup_loader()

Environment Information

rasterio info:
  rasterio: 1.4.0
      GDAL: 3.9.2
      PROJ: 9.4.1
      GEOS: 3.11.1
 PROJ DATA: /usr/local/lib/python3.10/dist-packages/rasterio/proj_data
 GDAL DATA: /usr/local/lib/python3.10/dist-packages/rasterio/gdal_data

System:
    python: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
executable: /usr/bin/python3
   machine: Linux-6.8.0-57-generic-x86_64-with-glibc2.35

Python deps:
    affine: 2.4.0
     attrs: 25.3.0
   certifi: 2025.01.31
     click: 8.1.8
     cligj: 0.7.2
    cython: None
     numpy: 1.26.4
click-plugins: None
setuptools: 59.6.0

perliedman avatar Apr 22 '25 18:04 perliedman

Sorry, obviously copied Environment information from the wrong container, here's the info where the I actually see the problem:

rasterio info:
  rasterio: 1.4.1
      GDAL: 3.9.2
      PROJ: 9.4.1
      GEOS: 3.11.1
 PROJ DATA: /usr/local/lib/python3.10/dist-packages/rasterio/proj_data
 GDAL DATA: /usr/local/lib/python3.10/dist-packages/rasterio/gdal_data

System:
    python: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
executable: /usr/bin/python3
   machine: Linux-6.8.0-57-generic-x86_64-with-glibc2.35

Python deps:
    affine: 2.4.0
     attrs: 25.3.0
   certifi: 2025.01.31
     click: 8.1.8
     cligj: 0.7.2
    cython: None
     numpy: 1.26.4
click-plugins: None
setuptools: 59.6.0

perliedman avatar Apr 22 '25 19:04 perliedman

Thanks for the report @perliedman !

sgillies avatar Apr 22 '25 20:04 sgillies