rasterio
rasterio copied to clipboard
Plugin lookup is slow
I've suspected that there's a price to pay for pluggability and in benchmarking rio-warp using our small RGB.byte.tif file I'm finding that time rio warp tests/data/RGB.byte.tif runs twice as fast if rio-warp is the only registered rasterio.rio_commands entry point.
Profiling is needed to see what we can do about this.
A couple data points:
My Python 2.7 dev virtualenv
It's got a lot of dev installs:
(pydotorg27)MapBox-FC:rasterio sean$ python -c "import sys;print(sys.path)"
['', '/Users/sean/code/mercantile', '/Users/sean/code/make-surface', '/Users/sean/code/Fiona', '/Users/sean/code/geojsonio.py', '/Users/sean/code/snuggs', '/Users/sean/code/pygeobuf', '/Users/sean/code/python-uploads-client', '/Users/sean/code/Shapely', '/Users/sean/code/platter', '/Users/sean/code/rio-mucho', '/Users/sean/code/tilecontrol', '/Users/sean/code/mbx-cli', '/Users/sean/code/rasterio', '/Users/sean/code/descartes', '/Users/sean/code/mapbox-cli-py', '/Users/sean/code/mapbox-sdk-py', '/Users/sean/code/fio-mapbox', '/Users/sean/code/affine', '/Users/sean/code/aws-cli', '/Users/sean/envs/pydotorg27/lib/python27.zip', '/Users/sean/envs/pydotorg27/lib/python2.7', '/Users/sean/envs/pydotorg27/lib/python2.7/plat-darwin', '/Users/sean/envs/pydotorg27/lib/python2.7/plat-mac', '/Users/sean/envs/pydotorg27/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/pydotorg27/lib/python2.7/lib-tk', '/Users/sean/envs/pydotorg27/lib/python2.7/lib-old', '/Users/sean/envs/pydotorg27/lib/python2.7/lib-dynload', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/pydotorg27/lib/python2.7/site-packages']
(pydotorg27)MapBox-FC:rasterio sean$ time rio --help | tail
insp Open a data file and start an interpreter.
mask Mask in raster using features.
merge Merge a stack of raster datasets.
overview Construct overviews in an existing dataset.
rasterize Rasterize features.
sample Sample a dataset.
shapes Write shapes extracted from bands or masks.
stack Stack a number of bands into a multiband dataset.
transform Transform coordinates.
warp Warp a raster dataset.
real 0m1.103s
user 0m0.881s
sys 0m0.223s
A fresh Python 2.7 virtualenv
With nothing installed other than pip and rasterio
(test_rio27)MapBox-FC:~ sean$ python -c "import sys;print(sys.path)"
['', '/Users/sean/envs/test_rio27/lib/python27.zip', '/Users/sean/envs/test_rio27/lib/python2.7', '/Users/sean/envs/test_rio27/lib/python2.7/plat-darwin', '/Users/sean/envs/test_rio27/lib/python2.7/plat-mac', '/Users/sean/envs/test_rio27/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/test_rio27/lib/python2.7/lib-tk', '/Users/sean/envs/test_rio27/lib/python2.7/lib-old', '/Users/sean/envs/test_rio27/lib/python2.7/lib-dynload', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac', '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages', '/Users/sean/envs/test_rio27/lib/python2.7/site-packages']
(test_rio27)MapBox-FC:~ sean$ time rio --help | tail
insp Open a data file and start an interpreter.
mask Mask in raster using features.
merge Merge a stack of raster datasets.
overview Construct overviews in an existing dataset.
rasterize Rasterize features.
sample Sample a dataset.
shapes Write shapes extracted from bands or masks.
stack Stack a number of bands into a multiband dataset.
transform Transform coordinates.
warp Warp a raster dataset.
real 0m0.262s
user 0m0.190s
sys 0m0.070s
Comparing the time it takes to list entry points in a pkg_resources working set, I find:
My Python 2.7 dev virtualenv
(pydotorg27)MapBox-FC:rasterio sean$ time python -c "import pkg_resources;print(list(pkg_resources.iter_entry_points('rasterio.rio_commands')))"
[EntryPoint.parse('info = rasterio.rio.info:info'), EntryPoint.parse('sample = rasterio.rio.sample:sample'), EntryPoint.parse('convert = rasterio.rio.convert:convert'), EntryPoint.parse('rasterize = rasterio.rio.rasterize:rasterize'), EntryPoint.parse('clip = rasterio.rio.clip:clip'), EntryPoint.parse('mask = rasterio.rio.mask:mask'), EntryPoint.parse('edit-info = rasterio.rio.edit_info:edit'), EntryPoint.parse('bounds = rasterio.rio.bounds:bounds'), EntryPoint.parse('warp = rasterio.rio.warp:warp'), EntryPoint.parse('transform = rasterio.rio.transform:transform'), EntryPoint.parse('insp = rasterio.rio.insp:insp'), EntryPoint.parse('merge = rasterio.rio.merge:merge'), EntryPoint.parse('env = rasterio.rio.env:env'), EntryPoint.parse('overview = rasterio.rio.overview:overview'), EntryPoint.parse('shapes = rasterio.rio.shapes:shapes'), EntryPoint.parse('calc = rasterio.rio.calc:calc'), EntryPoint.parse('stack = rasterio.rio.stack:stack')]
real 0m0.162s
user 0m0.113s
sys 0m0.047s
A fresh Python 2.7 virtualenv
(test_rio27)MapBox-FC:~ sean$ time python -c "import pkg_resources;print(list(pkg_resources.iter_entry_points('rasterio.rio_commands')))"
[EntryPoint.parse('info = rasterio.rio.info:info'), EntryPoint.parse('sample = rasterio.rio.sample:sample'), EntryPoint.parse('convert = rasterio.rio.convert:convert'), EntryPoint.parse('rasterize = rasterio.rio.rasterize:rasterize'), EntryPoint.parse('clip = rasterio.rio.clip:clip'), EntryPoint.parse('mask = rasterio.rio.mask:mask'), EntryPoint.parse('edit-info = rasterio.rio.edit_info:edit'), EntryPoint.parse('bounds = rasterio.rio.bounds:bounds'), EntryPoint.parse('warp = rasterio.rio.warp:warp'), EntryPoint.parse('transform = rasterio.rio.transform:transform'), EntryPoint.parse('insp = rasterio.rio.insp:insp'), EntryPoint.parse('merge = rasterio.rio.merge:merge'), EntryPoint.parse('env = rasterio.rio.env:env'), EntryPoint.parse('overview = rasterio.rio.overview:overview'), EntryPoint.parse('shapes = rasterio.rio.shapes:shapes'), EntryPoint.parse('calc = rasterio.rio.calc:calc'), EntryPoint.parse('stack = rasterio.rio.stack:stack')]
real 0m0.080s
user 0m0.060s
sys 0m0.018s
In our CLI's main function, we're making two passes over the working set to find plugins (bc two entry point groups). One would be faster. Also, we should look into whether we can avoid loading unnecessary plugins. rio --help needs to load them all to get their short help texts, but rio info --help shouldn't have to load any rasterio.rio_plugins (3rd party) entry points and maybe not any of the other rasterio.rio_commands entry points. /cc @geowurster
@sgillies I have noticed this occasionally as well and I suspect we may have to manage our CLI functions through the normal click API by decorating with @rasterio.rio.main.main_group() rather than registering all our $ rio commands to an entry-point.
This issue is tracked at https://github.com/pypa/setuptools/issues/510, but a recent discussion at https://github.com/pypa/setuptools/issues/510#issuecomment-368125053 has some suggestions.