maggma icon indicating copy to clipboard operation
maggma copied to clipboard

[Feature Request]: Leverage optional dependency groups to reduce dependency count

Open rkingsbury opened this issue 1 year ago • 6 comments

Problem

As recently surfaced in a peer review of a package that depends on maggma, maggma introduces a LOT of dependencies (ultimately because it so versatile). However, any given user of maggma is probably only using it to connect 2 or 3 types of Store and might not make use of many of these.

├── maggma [required: >=0.57.5, installed: 0.63.3]
│   ├── aioitertools [required: >=0.5.1, installed: 0.11.0]
│   │   └── typing-extensions [required: >=4.0, installed: 4.10.0]
│   ├── boto3 [required: >=1.20.41, installed: 1.34.49]
│   │   ├── botocore [required: >=1.34.49,<1.35.0, installed: 1.34.49]
│   │   │   ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│   │   │   ├── python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
│   │   │   │   └── six [required: >=1.5, installed: 1.16.0]
│   │   │   └── urllib3 [required: >=1.25.4,<1.27, installed: 1.26.18]
│   │   ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│   │   └── s3transfer [required: >=0.10.0,<0.11.0, installed: 0.10.0]
│   │       └── botocore [required: >=1.33.2,<2.0a.0, installed: 1.34.49]
│   │           ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│   │           ├── python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
│   │           │   └── six [required: >=1.5, installed: 1.16.0]
│   │           └── urllib3 [required: >=1.25.4,<1.27, installed: 1.26.18]
│   ├── dnspython [required: >=1.16.0, installed: 2.6.1]
│   ├── fastapi [required: >=0.42.0, installed: 0.110.0]
│   │   ├── pydantic [required: >=1.7.4,<3.0.0,!=2.1.0,!=2.0.1,!=2.0.0,!=1.8.1,!=1.8, installed: 2.6.2]
│   │   │   ├── annotated-types [required: >=0.4.0, installed: 0.6.0]
│   │   │   ├── pydantic-core [required: ==2.16.3, installed: 2.16.3]
│   │   │   │   └── typing-extensions [required: >=4.6.0,!=4.7.0, installed: 4.10.0]
│   │   │   └── typing-extensions [required: >=4.6.1, installed: 4.10.0]
│   │   ├── starlette [required: >=0.36.3,<0.37.0, installed: 0.36.3]
│   │   │   ├── anyio [required: >=3.4.0,<5, installed: 4.3.0]
│   │   │   │   ├── exceptiongroup [required: >=1.0.2, installed: 1.2.0]
│   │   │   │   ├── idna [required: >=2.8, installed: 3.6]
│   │   │   │   ├── sniffio [required: >=1.1, installed: 1.3.1]
│   │   │   │   └── typing-extensions [required: >=4.1, installed: 4.10.0]
│   │   │   └── typing-extensions [required: >=3.10.0, installed: 4.10.0]
│   │   └── typing-extensions [required: >=4.8.0, installed: 4.10.0]
│   ├── jsonschema [required: >=3.1.1, installed: 4.21.1]
│   │   ├── attrs [required: >=22.2.0, installed: 23.2.0]
│   │   ├── jsonschema-specifications [required: >=2023.03.6, installed: 2023.12.1]
│   │   │   └── referencing [required: >=0.31.0, installed: 0.33.0]
│   │   │       ├── attrs [required: >=22.2.0, installed: 23.2.0]
│   │   │       └── rpds-py [required: >=0.7.0, installed: 0.18.0]
│   │   ├── referencing [required: >=0.28.4, installed: 0.33.0]
│   │   │   ├── attrs [required: >=22.2.0, installed: 23.2.0]
│   │   │   └── rpds-py [required: >=0.7.0, installed: 0.18.0]
│   │   └── rpds-py [required: >=0.7.1, installed: 0.18.0]
│   ├── mongogrant [required: >=0.3.1, installed: 0.3.3]
│   │   ├── click [required: Any, installed: 8.1.7]
│   │   ├── flask [required: >=1.0, installed: 3.0.2]
│   │   │   ├── blinker [required: >=1.6.2, installed: 1.7.0]
│   │   │   ├── click [required: >=8.1.3, installed: 8.1.7]
│   │   │   ├── importlib-metadata [required: >=3.6.0, installed: 7.0.1]
│   │   │   │   └── zipp [required: >=0.5, installed: 3.17.0]
│   │   │   ├── itsdangerous [required: >=2.1.2, installed: 2.1.2]
│   │   │   ├── Jinja2 [required: >=3.1.2, installed: 3.1.3]
│   │   │   │   └── MarkupSafe [required: >=2.0, installed: 2.1.5]
│   │   │   └── werkzeug [required: >=3.0.0, installed: 3.0.1]
│   │   │       └── MarkupSafe [required: >=2.1.1, installed: 2.1.5]
│   │   ├── pymongo [required: >=3.8, installed: 4.6.2]
│   │   │   └── dnspython [required: >=1.16.0,<3.0.0, installed: 2.6.1]
│   │   └── requests [required: Any, installed: 2.31.0]
│   │       ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │       ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │       ├── idna [required: >=2.5,<4, installed: 3.6]
│   │       └── urllib3 [required: >=1.21.1,<3, installed: 1.26.18]
│   ├── mongomock [required: >=3.10.0, installed: 4.1.2]
│   │   ├── packaging [required: Any, installed: 23.2]
│   │   └── sentinels [required: Any, installed: 1.0.0]
│   ├── monty [required: >=2023.9.25, installed: 2024.2.26]
│   ├── msgpack [required: >=0.5.6, installed: 1.0.7]
│   ├── numpy [required: >=1.17.3, installed: 1.26.4]
│   ├── orjson [required: >=3.9.0, installed: 3.9.15]
│   ├── pydantic [required: >=2.0, installed: 2.6.2]
│   │   ├── annotated-types [required: >=0.4.0, installed: 0.6.0]
│   │   ├── pydantic-core [required: ==2.16.3, installed: 2.16.3]
│   │   │   └── typing-extensions [required: >=4.6.0,!=4.7.0, installed: 4.10.0]
│   │   └── typing-extensions [required: >=4.6.1, installed: 4.10.0]
│   ├── pydantic-settings [required: >=2.0.3, installed: 2.2.1]
│   │   ├── pydantic [required: >=2.3.0, installed: 2.6.2]
│   │   │   ├── annotated-types [required: >=0.4.0, installed: 0.6.0]
│   │   │   ├── pydantic-core [required: ==2.16.3, installed: 2.16.3]
│   │   │   │   └── typing-extensions [required: >=4.6.0,!=4.7.0, installed: 4.10.0]
│   │   │   └── typing-extensions [required: >=4.6.1, installed: 4.10.0]
│   │   └── python-dotenv [required: >=0.21.0, installed: 1.0.1]
│   ├── pydash [required: >=4.1.0, installed: 7.0.7]
│   │   └── typing-extensions [required: >=3.10,!=4.6.0, installed: 4.10.0]
│   ├── pymongo [required: >=4.2.0, installed: 4.6.2]
│   │   └── dnspython [required: >=1.16.0,<3.0.0, installed: 2.6.1]
│   ├── python-dateutil [required: >=2.8.2, installed: 2.8.2]
│   │   └── six [required: >=1.5, installed: 1.16.0]
│   ├── pyzmq [required: >=24.0.1, installed: 25.1.2]
│   ├── ruamel.yaml [required: <0.18, installed: 0.17.40]
│   │   └── ruamel.yaml.clib [required: >=0.2.7, installed: 0.2.8]
│   ├── setuptools [required: Any, installed: 69.1.1]
│   ├── sshtunnel [required: >=0.1.5, installed: 0.4.0]
│   │   └── paramiko [required: >=2.7.2, installed: 3.4.0]
│   │       ├── bcrypt [required: >=3.2, installed: 4.1.2]
│   │       ├── cryptography [required: >=3.3, installed: 42.0.5]
│   │       │   └── cffi [required: >=1.12, installed: 1.16.0]
│   │       │       └── pycparser [required: Any, installed: 2.21]
│   │       └── PyNaCl [required: >=1.5, installed: 1.5.0]
│   │           └── cffi [required: >=1.4.1, installed: 1.16.0]
│   │               └── pycparser [required: Any, installed: 2.21]
│   ├── tqdm [required: >=4.19.6, installed: 4.66.2]
│   └── uvicorn [required: >=0.18.3, installed: 0.27.1]
│       ├── click [required: >=7.0, installed: 8.1.7]
│       ├── h11 [required: >=0.8, installed: 0.14.0]
│       └── typing-extensions [required: >=4.0, installed: 4.10.0]

Proposed Solution

Our setup.py already leverages optional dependency groups for certain Store. My suggestion is to double-down on this concept so that by default, maggma is installed only with a mininmum set of depdendencies to enable

  • MemoryStore
  • JSONStore
  • MongoStore

With all other more specialized stores requiring optional dependency groups.

Our current setup.py

    extras_require={
        "vault": ["hvac>=0.9.5"],
        "memray": ["memray>=1.7.0"],
        "montydb": ["montydb>=2.3.12"],
        "notebook_runner": ["IPython>=8.11", "nbformat>=5.0", "regex>=2020.6"],
        "azure": ["azure-storage-blob>=12.16.0", "azure-identity>=1.12.0"],
        "open_data": ["pandas>=2.1.4", "jsonlines>=4.0.0"],
        "testing": [
            "pytest",
            "pytest-cov",
            "pytest-mock",
            "pytest-asyncio",
            "pytest-xdist",
            "pre-commit",
            "moto",
            "ruff",
            "responses<0.22.0",
            "types-pyYAML",
            "types-setuptools",
            "types-python-dateutil",
            "starlette[full]",
        ],
        "docs": [
            "mkdocs>=1.4.0",
            "mkdocs-material>=8.3.9",
            "mkdocs-minify-plugin>=0.5.0",
            "mkdocstrings[python]>=0.18.1",
            "jinja2<3.2.0",
        ],
    },

Related, the API components carry a whole separate set of dependencies. Perhaps those should be broken into their own dependency group.

Alternatives

An alternative would be to break maggma into namespace packages for Builder , Store, and the API components. I'm not a fan of this option as I find maintaining and installing these really confusing (see: emmet!)

rkingsbury avatar Feb 27 '24 16:02 rkingsbury

Interested in thoughts from @munrojm , @jmmshn , @Andrew-S-Rosen and others on this

rkingsbury avatar Feb 27 '24 16:02 rkingsbury

I think that your proposal makes sense, with updated documentation to reflect the "extras" needed for a given Store. From the list of dependencies shown though, it's not initially clear to me what further groups you would have. Any thoughts on that?

I definitely would avoid the namespace stuff for the reasons you stated.

Andrew-S-Rosen avatar Feb 27 '24 16:02 Andrew-S-Rosen

I think that your proposal makes sense, with updated documentation to reflect the "extras" needed for a given Store. From the list of dependencies shown though, it's not initially clear to me what further groups you would have. Any thoughts on that?

I definitely would avoid the namespace stuff for the reasons you stated.

Based on the graph fastapi, boto3, and mongogrant all introduce a lot of extras. Those are needed for, respectively, the API components, S3Store / OpenDataStore, MongoGrantStore (aside - is anyone still using this?). It seems like each of those could be flagged as requiring optional extras. What do you think?

rkingsbury avatar Feb 27 '24 16:02 rkingsbury

Also, it doesn't appear that dnspython or pyzmq are used anywhere in the code. Does anyone know if these have a function that isn't exposed via import statements? (I'm not too familiar with these)

rkingsbury avatar Feb 27 '24 16:02 rkingsbury

Makes sense re: fastapi, boto3, and mogogrant. I'd support their use as optional dependencies.

Re: pyzmq, I think you may need to look at the zmq import, e.g. here.

Andrew-S-Rosen avatar Feb 27 '24 17:02 Andrew-S-Rosen

I am all for this. Would just want to make sure it is appropriately scoped w.r.t to the default store types supported. My opinion is that It would be nice to support the S3Store / OpenDataStore out of the box. MongoGrantStore should most likely just be removed...

FYI, pyzmq is only used for the distributed builder running.

munrojm avatar Feb 27 '24 17:02 munrojm

All - revisiting this proposal after a few months away. I've made mongogrant an optional dependency group. I want to ask your thoughts on the following additional changes:

  1. Move fastapi (and any other API-only deps) into an optional [api] dependency group. It seems only a fraction of maggma users are likely to use the api components, especially because they aren't really documented right now, and doing so would substantially reduce the dependency count of a standard installation
  2. Per @munrojm suggestion - let's eliminate the opendata group and install pandas and jsonlines by default. Neither is heavy on sub-dependencies and that way we support Opendata out of the box.

We can also update the docs with Note boxes wherever optional dependencies are required.

rkingsbury avatar Jun 20 '24 19:06 rkingsbury

I agree on both accounts.

Andrew-S-Rosen avatar Jun 20 '24 19:06 Andrew-S-Rosen

One more vote for

  • Keeping the boto3 stuff, IMO >20% of people will want to use it even if it's just indirectly while using jobflow/atomate2.

  • Removing the mongogrant stuff, although we should consider moving the credential parsing part of the mongogrant into maggma so we can still use MongograntStores (with possible renames, I think this can save people from potential credential leaks in their scripts) I think maggma just uses this one function: https://github.com/materialsproject/maggma/blob/bc3d73f38be9ca3747d82b4c7fbb6e5e38660a48/src/maggma/stores/advanced_stores.py#L78C22-L78C50

  • Moving the api dependencies to optional.

jmmshn avatar Jun 21 '24 16:06 jmmshn

OK, as of release 0.68.6 [api] is an optional dependency group and opendata components are installed by default (as is boto3, which was always the case).

I didn't really mean to merge PR #970 ; it somehow got lumped in with another one that merged so apologies for that. But it seems most here are in favor of the change so I'll leave it as-is unless someone raises objections.

I think we've tackled the low-hanging fruit here w.r.t dependency groups, so I'll consider this issue closed.

rkingsbury avatar Jun 21 '24 17:06 rkingsbury