rasterio icon indicating copy to clipboard operation
rasterio copied to clipboard

Plugin lookup is slow

Open sgillies opened this issue 9 years ago • 5 comments

I've suspected that there's a price to pay for pluggability and in benchmarking rio-warp using our small RGB.byte.tif file I'm finding that time rio warp tests/data/RGB.byte.tif runs twice as fast if rio-warp is the only registered rasterio.rio_commands entry point.

Profiling is needed to see what we can do about this.

sgillies avatar May 19 '16 17:05 sgillies

A couple data points:

My Python 2.7 dev virtualenv

It's got a lot of dev installs:

(pydotorg27)MapBox-FC:rasterio sean$ python -c "import sys;print(sys.path)"
['', '/Users/sean/code/mercantile', '/Users/sean/code/make-surface', '/Users/sean/code/Fiona', '/Users/sean/code/geojsonio.py', '/Users/sean/code/snuggs', '/Users/sean/code/pygeobuf', '/Users/sean/code/python-uploads-client', '/Users/sean/code/Shapely', '/Users/sean/code/platter', '/Users/sean/code/rio-mucho', '/Users/sean/code/tilecontrol', '/Users/sean/code/mbx-cli', '/Users/sean/code/rasterio', '/Users/sean/code/descartes', '/Users/sean/code/mapbox-cli-py', '/Users/sean/code/mapbox-sdk-py', '/Users/sean/code/fio-mapbox', '/Users/sean/code/affine', '/Users/sean/code/aws-cli', '/Users/sean/envs/pydotorg27/lib/python27.zip', '/Users/sean/envs/pydotorg27/lib/python2.7', '/Users/sean/envs/pydotorg27/lib/python2.7/plat-darwin', '/Users/sean/envs/pydotorg27/lib/python2.7/plat-mac', '/Users/sean/envs/pydotorg27/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/pydotorg27/lib/python2.7/lib-tk', '/Users/sean/envs/pydotorg27/lib/python2.7/lib-old', '/Users/sean/envs/pydotorg27/lib/python2.7/lib-dynload', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/pydotorg27/lib/python2.7/site-packages']
(pydotorg27)MapBox-FC:rasterio sean$ time rio --help | tail
  insp       Open a data file and start an interpreter.
  mask       Mask in raster using features.
  merge      Merge a stack of raster datasets.
  overview   Construct overviews in an existing dataset.
  rasterize  Rasterize features.
  sample     Sample a dataset.
  shapes     Write shapes extracted from bands or masks.
  stack      Stack a number of bands into a multiband dataset.
  transform  Transform coordinates.
  warp       Warp a raster dataset.

real    0m1.103s
user    0m0.881s
sys 0m0.223s

A fresh Python 2.7 virtualenv

With nothing installed other than pip and rasterio

(test_rio27)MapBox-FC:~ sean$ python -c "import sys;print(sys.path)"
['', '/Users/sean/envs/test_rio27/lib/python27.zip', '/Users/sean/envs/test_rio27/lib/python2.7', '/Users/sean/envs/test_rio27/lib/python2.7/plat-darwin', '/Users/sean/envs/test_rio27/lib/python2.7/plat-mac', '/Users/sean/envs/test_rio27/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/test_rio27/lib/python2.7/lib-tk', '/Users/sean/envs/test_rio27/lib/python2.7/lib-old', '/Users/sean/envs/test_rio27/lib/python2.7/lib-dynload', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/test_rio27/lib/python2.7/site-packages']
(test_rio27)MapBox-FC:~ sean$ time rio --help | tail
  insp       Open a data file and start an interpreter.
  mask       Mask in raster using features.
  merge      Merge a stack of raster datasets.
  overview   Construct overviews in an existing dataset.
  rasterize  Rasterize features.
  sample     Sample a dataset.
  shapes     Write shapes extracted from bands or masks.
  stack      Stack a number of bands into a multiband dataset.
  transform  Transform coordinates.
  warp       Warp a raster dataset.

real    0m0.262s
user    0m0.190s
sys 0m0.070s

sgillies avatar May 19 '16 19:05 sgillies

Comparing the time it takes to list entry points in a pkg_resources working set, I find:

My Python 2.7 dev virtualenv

(pydotorg27)MapBox-FC:rasterio sean$ time python -c "import pkg_resources;print(list(pkg_resources.iter_entry_points('rasterio.rio_commands')))"
[EntryPoint.parse('info = rasterio.rio.info:info'), EntryPoint.parse('sample = rasterio.rio.sample:sample'), EntryPoint.parse('convert = rasterio.rio.convert:convert'), EntryPoint.parse('rasterize = rasterio.rio.rasterize:rasterize'), EntryPoint.parse('clip = rasterio.rio.clip:clip'), EntryPoint.parse('mask = rasterio.rio.mask:mask'), EntryPoint.parse('edit-info = rasterio.rio.edit_info:edit'), EntryPoint.parse('bounds = rasterio.rio.bounds:bounds'), EntryPoint.parse('warp = rasterio.rio.warp:warp'), EntryPoint.parse('transform = rasterio.rio.transform:transform'), EntryPoint.parse('insp = rasterio.rio.insp:insp'), EntryPoint.parse('merge = rasterio.rio.merge:merge'), EntryPoint.parse('env = rasterio.rio.env:env'), EntryPoint.parse('overview = rasterio.rio.overview:overview'), EntryPoint.parse('shapes = rasterio.rio.shapes:shapes'), EntryPoint.parse('calc = rasterio.rio.calc:calc'), EntryPoint.parse('stack = rasterio.rio.stack:stack')]

real    0m0.162s
user    0m0.113s
sys 0m0.047s

A fresh Python 2.7 virtualenv

(test_rio27)MapBox-FC:~ sean$ time python -c "import pkg_resources;print(list(pkg_resources.iter_entry_points('rasterio.rio_commands')))"
[EntryPoint.parse('info = rasterio.rio.info:info'), EntryPoint.parse('sample = rasterio.rio.sample:sample'), EntryPoint.parse('convert = rasterio.rio.convert:convert'), EntryPoint.parse('rasterize = rasterio.rio.rasterize:rasterize'), EntryPoint.parse('clip = rasterio.rio.clip:clip'), EntryPoint.parse('mask = rasterio.rio.mask:mask'), EntryPoint.parse('edit-info = rasterio.rio.edit_info:edit'), EntryPoint.parse('bounds = rasterio.rio.bounds:bounds'), EntryPoint.parse('warp = rasterio.rio.warp:warp'), EntryPoint.parse('transform = rasterio.rio.transform:transform'), EntryPoint.parse('insp = rasterio.rio.insp:insp'), EntryPoint.parse('merge = rasterio.rio.merge:merge'), EntryPoint.parse('env = rasterio.rio.env:env'), EntryPoint.parse('overview = rasterio.rio.overview:overview'), EntryPoint.parse('shapes = rasterio.rio.shapes:shapes'), EntryPoint.parse('calc = rasterio.rio.calc:calc'), EntryPoint.parse('stack = rasterio.rio.stack:stack')]

real    0m0.080s
user    0m0.060s
sys 0m0.018s

sgillies avatar May 19 '16 19:05 sgillies

In our CLI's main function, we're making two passes over the working set to find plugins (bc two entry point groups). One would be faster. Also, we should look into whether we can avoid loading unnecessary plugins. rio --help needs to load them all to get their short help texts, but rio info --help shouldn't have to load any rasterio.rio_plugins (3rd party) entry points and maybe not any of the other rasterio.rio_commands entry points. /cc @geowurster

sgillies avatar May 19 '16 19:05 sgillies

@sgillies I have noticed this occasionally as well and I suspect we may have to manage our CLI functions through the normal click API by decorating with @rasterio.rio.main.main_group() rather than registering all our $ rio commands to an entry-point.

geowurster avatar May 25 '16 17:05 geowurster

This issue is tracked at https://github.com/pypa/setuptools/issues/510, but a recent discussion at https://github.com/pypa/setuptools/issues/510#issuecomment-368125053 has some suggestions.

geowurster avatar Feb 25 '18 18:02 geowurster