arcgis-python-api icon indicating copy to clipboard operation
arcgis-python-api copied to clipboard

Lighten or make the heavy package dependencies optional

Open owenlamont opened this issue 1 year ago • 23 comments

Is your feature request related to a problem? Please describe. My problem is the dependencies of the arcgis package are so big and heavy its extremely onerous to incorporate arcgis into an existing Python application, even though often the features of arcgis we use don't require any of the really heavy dependencies (jupyter, pillow, ipywidgets, pandas, dask, numpy, matplotlib, etc.)

E.g. dependencies for arcgis 2.2.0

dependencies = [
        "pillow",
        "urllib3>=1.21.1,<3",
        "cachetools",
        "lxml",
        "notebook",
        "cryptography",
        "ipywidgets >=7,<8",
        "widgetsnbextension >=3",
        "jupyter-client <=6.1.12",
        "pandas >=2.0.0,<3",
        "numpy >=1.21.6",
        "matplotlib",
        "keyring >=23.3.0",
        "pylerc",
        "ujson >=3",
        "jupyterlab",
        "python-certifi-win32;python_version<'3.10'",
        "truststore>=0.7.0;python_version>'3.9'",
        'pywin32 >=223;platform_system=="Windows"',
        "pyshp >=2",
        "geomet",
        "requests >=2.27.1,<3",
        "requests-oauthlib",
        "requests_toolbelt",
        "pyspnego >=0.8.0",
        "requests-kerberos",
        "requests-gssapi",
        "dask >=2023.3.2",
        "matplotlib-inline",
    ]

Describe the solution you'd like I'd like the arcgis package restructured or designed to make all the really heavy dependencies optional, so I can only include the ones I need for the functions I use.

Describe alternatives you've considered The first alternative I tried was continuing to use arcgis as is and tolerating the hundreds of megabytes of additional dependencies it would pull in that we didn't need, and the extra 10+ minutes of dependency resolving time this would incur every time we need to update our own dependencies.

The current alternative I use (which is very time consuming and painful for me) is to perform the operations I do need with arcgis, reverse engineer the requests it makes to the REST API, then re-implement the minimal parts of the arcgis package I need for my own application.

owenlamont avatar Oct 12 '23 02:10 owenlamont

@owenlamont Thanks for bringing this up, we are actively working on this and will have updates in the future to help with this workflow.

nanaeaubry avatar Oct 12 '23 06:10 nanaeaubry

I think you could consider a similar approach to Dask. You could implement ArcGIS Core using one of the following commands:

pip install arcgis-core

or

conda install arcgis-core

Dask documentation for more information: https://docs.dask.org/en/stable/install.html

hildermesmedeiros avatar Oct 12 '23 13:10 hildermesmedeiros

@hildermesmedeiros @owenlamont if you wouldn't mind, could to explain that areas of the Python API you mainly use and what areas you never touch?

achapkowski avatar Oct 12 '23 13:10 achapkowski

For my current appliction its CRUD operations around FeatureLayers, more specifically I want to programmatically upload geojson files, publish them as FeatureLayers, replace the geojson file currently associated with an existing FeatureLayer, and in future maybe some other minor update operations around renaming, deleting, or updating metadata for those types of entities.

owenlamont avatar Oct 12 '23 21:10 owenlamont

I know the story around optional dependencies in conda isn't so good (you probably have to publish multiple conda packages with different combinations of dependencies). Pip allows different dependency profiles where you could select the subset of dependencies you need, e.g.

pip install arcgis[all] or pip install arcgis[jupyter]

If you go down that path I'd suggest if a user just runs pip install arcgis they just get the minimal core dependencies.

owenlamont avatar Oct 12 '23 22:10 owenlamont

No this is a great story and suggestions. Thank you for sharing this.

achapkowski avatar Oct 12 '23 23:10 achapkowski

Glad to hear it, with my above comment I just meant its a bit trickier/harder to support optional dependencies in conda than it is in pip (even though in general I like conda better).

owenlamont avatar Oct 12 '23 23:10 owenlamont

If you're interested I know there's been discussion in the conda community for supporting that style of pip optional dependencies here and later here. I'm not quite sure what the current state of it is.

owenlamont avatar Oct 12 '23 23:10 owenlamont

Glad to hear it, with my above comment I just meant its a bit trickier/harder to support optional dependencies in conda than it is in pip (even though in general I like conda better).

It is, we are looking into building a set of namespace packages for a future release. Having folks, like yourself, post how they use the API helps us understand how different users use the API. If you know of other use cases, please post more of them!

achapkowski avatar Oct 13 '23 10:10 achapkowski

To do the minimal deps I do something similar to this guide: https://developers.arcgis.com/python/guide/install-and-set-up/ image

I'd like we had the conda install arcgis-core idea but you might have more work overhead.

With many customers using the cloud the fight over cost is increasing, and files size is one of the many costs to be fought these days, at least on my side it has been more frequent.

For example, a customer needed to verify that some key FeatureLayers features were being updated correctly. We created a script, it runs every 5 minutes to check if everything is as expected, otherwise it creates it as json and Geoevent sends an email.

Another client wanted to create a routing system using Arcgis Server, but did not want the logic to be executed on the server but rather clound on the same infrastructure and asked for it to be as small as possible and without dependency on arcpy.

This kind of thing...

hildermesmedeiros avatar Oct 15 '23 19:10 hildermesmedeiros

Thanks, I missed that minimal dependencies documentation. Although I don't think that would work for us - we use pip-tools to maintain our application dependencies and you can't specify a package be installed with --no-deps with that..

Some of those minimal requests dependencies are platform/authentication type specific themselves right? I know when I implemented a Python client to authenticate and connect with the REST API (which is deployed to a Linux docker image) I did need requests and requests-oauthlib - but things like requests-kerberos and several of those other dependencies are just for Windows SSO auth?

owenlamont avatar Oct 15 '23 21:10 owenlamont

@owenlamont , no. Most are not Windows bound. Most are just protocols or ways of doing authentication or automations in backend

  • requests-toolbelt collection of utilities that some users of python-requests might need but do not belong in requests proper. How to keep the tcp alive? How to handle sockets? Those and more are implemented there.

  • requests-kerberos protocol Kerberos/GSSAPI of authentication in 3 ways, it kinda safe, so there is auth in linux, windows, mac...it is a protocol. The company, the IT might implement it, You will be bound to respect it.

  • requests-oauthlib high level implementation of auth1 and auth2 request.

  • requests-gssapi It provides a fully backward-compatible shim for the old python-requests-kerberos library: simply replace. yeah, old stuff, life sucks. But i guess request-kerberos could be used in this case, don`t know. import.requests_kerberos with import requests_gssapi

  • requests_ntlm2 Implements Windows auth method NTLM. Windows does the favour of using kerberos but, if things goes wrongs it falls back to old methods, just cause they can, why not.

hildermesmedeiros avatar Oct 16 '23 12:10 hildermesmedeiros

My Docker build takes 11 minutes to build, but most of that time (around 8 minutes) is spent installing ujson which is required by arcgis.

CallumNZ avatar Mar 05 '24 23:03 CallumNZ

My issue is I want to use arcgis library for it's arcgis rest api client... in a server application... and all the dependencies bloat the container. It would be better if the heavy dependencies can't be spun out to potentially push the rest client out to a separate dependency that just depends on the minimum to interact with the REST api.

ruckc avatar May 16 '24 11:05 ruckc

this improvement would be great - i also have an 11 minute container build. A common build pattern for containers is to pip install packages from a requirements.txt file which doesn't support the --no-deps argument https://github.com/pypa/pip/pull/10837

MartyP233 avatar May 17 '24 00:05 MartyP233

Hi everyone!

Thanks for all the input and opinions, please continue to provide your input as this helps us when choosing to prioritize enhancements. For this issue in particular, we have been actively working on it and there will be improvements and changes made to make the package lighter and split up in a more optimized way. We don't have a set timeline to provide as of now since there are some bigger changes involved but we will update the post when possible with more information.

nanaeaubry avatar May 17 '24 06:05 nanaeaubry