arcgis-python-api
arcgis-python-api copied to clipboard
Lighten or make the heavy package dependencies optional
Is your feature request related to a problem? Please describe. My problem is the dependencies of the arcgis package are so big and heavy its extremely onerous to incorporate arcgis into an existing Python application, even though often the features of arcgis we use don't require any of the really heavy dependencies (jupyter, pillow, ipywidgets, pandas, dask, numpy, matplotlib, etc.)
E.g. dependencies for arcgis 2.2.0
dependencies = [
"pillow",
"urllib3>=1.21.1,<3",
"cachetools",
"lxml",
"notebook",
"cryptography",
"ipywidgets >=7,<8",
"widgetsnbextension >=3",
"jupyter-client <=6.1.12",
"pandas >=2.0.0,<3",
"numpy >=1.21.6",
"matplotlib",
"keyring >=23.3.0",
"pylerc",
"ujson >=3",
"jupyterlab",
"python-certifi-win32;python_version<'3.10'",
"truststore>=0.7.0;python_version>'3.9'",
'pywin32 >=223;platform_system=="Windows"',
"pyshp >=2",
"geomet",
"requests >=2.27.1,<3",
"requests-oauthlib",
"requests_toolbelt",
"pyspnego >=0.8.0",
"requests-kerberos",
"requests-gssapi",
"dask >=2023.3.2",
"matplotlib-inline",
]
Describe the solution you'd like I'd like the arcgis package restructured or designed to make all the really heavy dependencies optional, so I can only include the ones I need for the functions I use.
Describe alternatives you've considered The first alternative I tried was continuing to use arcgis as is and tolerating the hundreds of megabytes of additional dependencies it would pull in that we didn't need, and the extra 10+ minutes of dependency resolving time this would incur every time we need to update our own dependencies.
The current alternative I use (which is very time consuming and painful for me) is to perform the operations I do need with arcgis, reverse engineer the requests it makes to the REST API, then re-implement the minimal parts of the arcgis package I need for my own application.
@owenlamont Thanks for bringing this up, we are actively working on this and will have updates in the future to help with this workflow.
I think you could consider a similar approach to Dask. You could implement ArcGIS Core using one of the following commands:
pip install arcgis-core
or
conda install arcgis-core
Dask documentation for more information: https://docs.dask.org/en/stable/install.html
@hildermesmedeiros @owenlamont if you wouldn't mind, could to explain that areas of the Python API you mainly use and what areas you never touch?
For my current appliction its CRUD operations around FeatureLayers, more specifically I want to programmatically upload geojson files, publish them as FeatureLayers, replace the geojson file currently associated with an existing FeatureLayer, and in future maybe some other minor update operations around renaming, deleting, or updating metadata for those types of entities.
I know the story around optional dependencies in conda isn't so good (you probably have to publish multiple conda packages with different combinations of dependencies). Pip allows different dependency profiles where you could select the subset of dependencies you need, e.g.
pip install arcgis[all] or pip install arcgis[jupyter]
If you go down that path I'd suggest if a user just runs pip install arcgis they just get the minimal core dependencies.
No this is a great story and suggestions. Thank you for sharing this.
Glad to hear it, with my above comment I just meant its a bit trickier/harder to support optional dependencies in conda than it is in pip (even though in general I like conda better).
If you're interested I know there's been discussion in the conda community for supporting that style of pip optional dependencies here and later here. I'm not quite sure what the current state of it is.
Glad to hear it, with my above comment I just meant its a bit trickier/harder to support optional dependencies in conda than it is in pip (even though in general I like conda better).
It is, we are looking into building a set of namespace packages for a future release. Having folks, like yourself, post how they use the API helps us understand how different users use the API. If you know of other use cases, please post more of them!
To do the minimal deps I do something similar to this guide: https://developers.arcgis.com/python/guide/install-and-set-up/
I'd like we had the conda install arcgis-core
idea but you might have more work overhead.
With many customers using the cloud the fight over cost is increasing, and files size is one of the many costs to be fought these days, at least on my side it has been more frequent.
For example, a customer needed to verify that some key FeatureLayers features were being updated correctly. We created a script, it runs every 5 minutes to check if everything is as expected, otherwise it creates it as json and Geoevent sends an email.
Another client wanted to create a routing system using Arcgis Server, but did not want the logic to be executed on the server but rather clound on the same infrastructure and asked for it to be as small as possible and without dependency on arcpy.
This kind of thing...
Thanks, I missed that minimal dependencies documentation. Although I don't think that would work for us - we use pip-tools to maintain our application dependencies and you can't specify a package be installed with --no-deps with that..
Some of those minimal requests dependencies are platform/authentication type specific themselves right? I know when I implemented a Python client to authenticate and connect with the REST API (which is deployed to a Linux docker image) I did need requests and requests-oauthlib - but things like requests-kerberos and several of those other dependencies are just for Windows SSO auth?
@owenlamont , no. Most are not Windows bound. Most are just protocols or ways of doing authentication or automations in backend
-
requests-toolbelt collection of utilities that some users of python-requests might need but do not belong in requests proper. How to keep the tcp alive? How to handle sockets? Those and more are implemented there.
-
requests-kerberos protocol Kerberos/GSSAPI of authentication in 3 ways, it kinda safe, so there is auth in linux, windows, mac...it is a protocol. The company, the IT might implement it, You will be bound to respect it.
-
requests-oauthlib high level implementation of auth1 and auth2 request.
-
requests-gssapi It provides a fully backward-compatible shim for the old python-requests-kerberos library: simply replace. yeah, old stuff, life sucks. But i guess request-kerberos could be used in this case, don`t know. import.requests_kerberos with import requests_gssapi
-
requests_ntlm2 Implements Windows auth method NTLM. Windows does the favour of using kerberos but, if things goes wrongs it falls back to old methods, just cause they can, why not.
My Docker build takes 11 minutes to build, but most of that time (around 8 minutes) is spent installing ujson
which is required by arcgis
.
My issue is I want to use arcgis library for it's arcgis rest api client... in a server application... and all the dependencies bloat the container. It would be better if the heavy dependencies can't be spun out to potentially push the rest client out to a separate dependency that just depends on the minimum to interact with the REST api.
this improvement would be great - i also have an 11 minute container build. A common build pattern for containers is to pip install packages from a requirements.txt file which doesn't support the --no-deps argument https://github.com/pypa/pip/pull/10837
Hi everyone!
Thanks for all the input and opinions, please continue to provide your input as this helps us when choosing to prioritize enhancements. For this issue in particular, we have been actively working on it and there will be improvements and changes made to make the package lighter and split up in a more optimized way. We don't have a set timeline to provide as of now since there are some bigger changes involved but we will update the post when possible with more information.