maggma
maggma copied to clipboard
[Feature Request]: Leverage optional dependency groups to reduce dependency count
Problem
As recently surfaced in a peer review of a package that depends on maggma
, maggma
introduces a LOT of dependencies (ultimately because it so versatile). However, any given user of maggma
is probably only using it to connect 2 or 3 types of Store
and might not make use of many of these.
├── maggma [required: >=0.57.5, installed: 0.63.3]
│ ├── aioitertools [required: >=0.5.1, installed: 0.11.0]
│ │ └── typing-extensions [required: >=4.0, installed: 4.10.0]
│ ├── boto3 [required: >=1.20.41, installed: 1.34.49]
│ │ ├── botocore [required: >=1.34.49,<1.35.0, installed: 1.34.49]
│ │ │ ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│ │ │ ├── python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
│ │ │ │ └── six [required: >=1.5, installed: 1.16.0]
│ │ │ └── urllib3 [required: >=1.25.4,<1.27, installed: 1.26.18]
│ │ ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│ │ └── s3transfer [required: >=0.10.0,<0.11.0, installed: 0.10.0]
│ │ └── botocore [required: >=1.33.2,<2.0a.0, installed: 1.34.49]
│ │ ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│ │ ├── python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
│ │ │ └── six [required: >=1.5, installed: 1.16.0]
│ │ └── urllib3 [required: >=1.25.4,<1.27, installed: 1.26.18]
│ ├── dnspython [required: >=1.16.0, installed: 2.6.1]
│ ├── fastapi [required: >=0.42.0, installed: 0.110.0]
│ │ ├── pydantic [required: >=1.7.4,<3.0.0,!=2.1.0,!=2.0.1,!=2.0.0,!=1.8.1,!=1.8, installed: 2.6.2]
│ │ │ ├── annotated-types [required: >=0.4.0, installed: 0.6.0]
│ │ │ ├── pydantic-core [required: ==2.16.3, installed: 2.16.3]
│ │ │ │ └── typing-extensions [required: >=4.6.0,!=4.7.0, installed: 4.10.0]
│ │ │ └── typing-extensions [required: >=4.6.1, installed: 4.10.0]
│ │ ├── starlette [required: >=0.36.3,<0.37.0, installed: 0.36.3]
│ │ │ ├── anyio [required: >=3.4.0,<5, installed: 4.3.0]
│ │ │ │ ├── exceptiongroup [required: >=1.0.2, installed: 1.2.0]
│ │ │ │ ├── idna [required: >=2.8, installed: 3.6]
│ │ │ │ ├── sniffio [required: >=1.1, installed: 1.3.1]
│ │ │ │ └── typing-extensions [required: >=4.1, installed: 4.10.0]
│ │ │ └── typing-extensions [required: >=3.10.0, installed: 4.10.0]
│ │ └── typing-extensions [required: >=4.8.0, installed: 4.10.0]
│ ├── jsonschema [required: >=3.1.1, installed: 4.21.1]
│ │ ├── attrs [required: >=22.2.0, installed: 23.2.0]
│ │ ├── jsonschema-specifications [required: >=2023.03.6, installed: 2023.12.1]
│ │ │ └── referencing [required: >=0.31.0, installed: 0.33.0]
│ │ │ ├── attrs [required: >=22.2.0, installed: 23.2.0]
│ │ │ └── rpds-py [required: >=0.7.0, installed: 0.18.0]
│ │ ├── referencing [required: >=0.28.4, installed: 0.33.0]
│ │ │ ├── attrs [required: >=22.2.0, installed: 23.2.0]
│ │ │ └── rpds-py [required: >=0.7.0, installed: 0.18.0]
│ │ └── rpds-py [required: >=0.7.1, installed: 0.18.0]
│ ├── mongogrant [required: >=0.3.1, installed: 0.3.3]
│ │ ├── click [required: Any, installed: 8.1.7]
│ │ ├── flask [required: >=1.0, installed: 3.0.2]
│ │ │ ├── blinker [required: >=1.6.2, installed: 1.7.0]
│ │ │ ├── click [required: >=8.1.3, installed: 8.1.7]
│ │ │ ├── importlib-metadata [required: >=3.6.0, installed: 7.0.1]
│ │ │ │ └── zipp [required: >=0.5, installed: 3.17.0]
│ │ │ ├── itsdangerous [required: >=2.1.2, installed: 2.1.2]
│ │ │ ├── Jinja2 [required: >=3.1.2, installed: 3.1.3]
│ │ │ │ └── MarkupSafe [required: >=2.0, installed: 2.1.5]
│ │ │ └── werkzeug [required: >=3.0.0, installed: 3.0.1]
│ │ │ └── MarkupSafe [required: >=2.1.1, installed: 2.1.5]
│ │ ├── pymongo [required: >=3.8, installed: 4.6.2]
│ │ │ └── dnspython [required: >=1.16.0,<3.0.0, installed: 2.6.1]
│ │ └── requests [required: Any, installed: 2.31.0]
│ │ ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ ├── idna [required: >=2.5,<4, installed: 3.6]
│ │ └── urllib3 [required: >=1.21.1,<3, installed: 1.26.18]
│ ├── mongomock [required: >=3.10.0, installed: 4.1.2]
│ │ ├── packaging [required: Any, installed: 23.2]
│ │ └── sentinels [required: Any, installed: 1.0.0]
│ ├── monty [required: >=2023.9.25, installed: 2024.2.26]
│ ├── msgpack [required: >=0.5.6, installed: 1.0.7]
│ ├── numpy [required: >=1.17.3, installed: 1.26.4]
│ ├── orjson [required: >=3.9.0, installed: 3.9.15]
│ ├── pydantic [required: >=2.0, installed: 2.6.2]
│ │ ├── annotated-types [required: >=0.4.0, installed: 0.6.0]
│ │ ├── pydantic-core [required: ==2.16.3, installed: 2.16.3]
│ │ │ └── typing-extensions [required: >=4.6.0,!=4.7.0, installed: 4.10.0]
│ │ └── typing-extensions [required: >=4.6.1, installed: 4.10.0]
│ ├── pydantic-settings [required: >=2.0.3, installed: 2.2.1]
│ │ ├── pydantic [required: >=2.3.0, installed: 2.6.2]
│ │ │ ├── annotated-types [required: >=0.4.0, installed: 0.6.0]
│ │ │ ├── pydantic-core [required: ==2.16.3, installed: 2.16.3]
│ │ │ │ └── typing-extensions [required: >=4.6.0,!=4.7.0, installed: 4.10.0]
│ │ │ └── typing-extensions [required: >=4.6.1, installed: 4.10.0]
│ │ └── python-dotenv [required: >=0.21.0, installed: 1.0.1]
│ ├── pydash [required: >=4.1.0, installed: 7.0.7]
│ │ └── typing-extensions [required: >=3.10,!=4.6.0, installed: 4.10.0]
│ ├── pymongo [required: >=4.2.0, installed: 4.6.2]
│ │ └── dnspython [required: >=1.16.0,<3.0.0, installed: 2.6.1]
│ ├── python-dateutil [required: >=2.8.2, installed: 2.8.2]
│ │ └── six [required: >=1.5, installed: 1.16.0]
│ ├── pyzmq [required: >=24.0.1, installed: 25.1.2]
│ ├── ruamel.yaml [required: <0.18, installed: 0.17.40]
│ │ └── ruamel.yaml.clib [required: >=0.2.7, installed: 0.2.8]
│ ├── setuptools [required: Any, installed: 69.1.1]
│ ├── sshtunnel [required: >=0.1.5, installed: 0.4.0]
│ │ └── paramiko [required: >=2.7.2, installed: 3.4.0]
│ │ ├── bcrypt [required: >=3.2, installed: 4.1.2]
│ │ ├── cryptography [required: >=3.3, installed: 42.0.5]
│ │ │ └── cffi [required: >=1.12, installed: 1.16.0]
│ │ │ └── pycparser [required: Any, installed: 2.21]
│ │ └── PyNaCl [required: >=1.5, installed: 1.5.0]
│ │ └── cffi [required: >=1.4.1, installed: 1.16.0]
│ │ └── pycparser [required: Any, installed: 2.21]
│ ├── tqdm [required: >=4.19.6, installed: 4.66.2]
│ └── uvicorn [required: >=0.18.3, installed: 0.27.1]
│ ├── click [required: >=7.0, installed: 8.1.7]
│ ├── h11 [required: >=0.8, installed: 0.14.0]
│ └── typing-extensions [required: >=4.0, installed: 4.10.0]
Proposed Solution
Our setup.py
already leverages optional dependency groups for certain Store
. My suggestion is to double-down on this concept so that by default, maggma
is installed only with a mininmum set of depdendencies to enable
-
MemoryStore
-
JSONStore
-
MongoStore
With all other more specialized stores requiring optional dependency groups.
Our current setup.py
extras_require={
"vault": ["hvac>=0.9.5"],
"memray": ["memray>=1.7.0"],
"montydb": ["montydb>=2.3.12"],
"notebook_runner": ["IPython>=8.11", "nbformat>=5.0", "regex>=2020.6"],
"azure": ["azure-storage-blob>=12.16.0", "azure-identity>=1.12.0"],
"open_data": ["pandas>=2.1.4", "jsonlines>=4.0.0"],
"testing": [
"pytest",
"pytest-cov",
"pytest-mock",
"pytest-asyncio",
"pytest-xdist",
"pre-commit",
"moto",
"ruff",
"responses<0.22.0",
"types-pyYAML",
"types-setuptools",
"types-python-dateutil",
"starlette[full]",
],
"docs": [
"mkdocs>=1.4.0",
"mkdocs-material>=8.3.9",
"mkdocs-minify-plugin>=0.5.0",
"mkdocstrings[python]>=0.18.1",
"jinja2<3.2.0",
],
},
Related, the API
components carry a whole separate set of dependencies. Perhaps those should be broken into their own dependency group.
Alternatives
An alternative would be to break maggma
into namespace packages for Builder
, Store
, and the API components. I'm not a fan of this option as I find maintaining and installing these really confusing (see: emmet!)
Interested in thoughts from @munrojm , @jmmshn , @Andrew-S-Rosen and others on this
I think that your proposal makes sense, with updated documentation to reflect the "extras" needed for a given Store
. From the list of dependencies shown though, it's not initially clear to me what further groups you would have. Any thoughts on that?
I definitely would avoid the namespace stuff for the reasons you stated.
I think that your proposal makes sense, with updated documentation to reflect the "extras" needed for a given
Store
. From the list of dependencies shown though, it's not initially clear to me what further groups you would have. Any thoughts on that?I definitely would avoid the namespace stuff for the reasons you stated.
Based on the graph fastapi
, boto3
, and mongogrant
all introduce a lot of extras. Those are needed for, respectively, the API components, S3Store
/ OpenDataStore
, MongoGrantStore
(aside - is anyone still using this?). It seems like each of those could be flagged as requiring optional extras. What do you think?
Also, it doesn't appear that dnspython
or pyzmq
are used anywhere in the code. Does anyone know if these have a function that isn't exposed via import
statements? (I'm not too familiar with these)
Makes sense re: fastapi, boto3, and mogogrant. I'd support their use as optional dependencies.
Re: pyzmq
, I think you may need to look at the zmq
import, e.g. here.
I am all for this. Would just want to make sure it is appropriately scoped w.r.t to the default store types supported. My opinion is that It would be nice to support the S3Store
/ OpenDataStore
out of the box. MongoGrantStore
should most likely just be removed...
FYI, pyzmq is only used for the distributed builder running.
All - revisiting this proposal after a few months away. I've made mongogrant
an optional dependency group. I want to ask your thoughts on the following additional changes:
- Move
fastapi
(and any other API-only deps) into an optional[api]
dependency group. It seems only a fraction ofmaggma
users are likely to use the api components, especially because they aren't really documented right now, and doing so would substantially reduce the dependency count of a standard installation - Per @munrojm suggestion - let's eliminate the
opendata
group and installpandas
andjsonlines
by default. Neither is heavy on sub-dependencies and that way we support Opendata out of the box.
We can also update the docs with Note boxes wherever optional dependencies are required.
I agree on both accounts.
One more vote for
-
Keeping the
boto3
stuff, IMO >20% of people will want to use it even if it's just indirectly while using jobflow/atomate2. -
Removing the
mongogrant
stuff, although we should consider moving the credential parsing part of themongogrant
intomaggma
so we can still use MongograntStores (with possible renames, I think this can save people from potential credential leaks in their scripts) I thinkmaggma
just uses this one function: https://github.com/materialsproject/maggma/blob/bc3d73f38be9ca3747d82b4c7fbb6e5e38660a48/src/maggma/stores/advanced_stores.py#L78C22-L78C50 -
Moving the
api
dependencies to optional.
OK, as of release 0.68.6
[api] is an optional dependency group and opendata components are installed by default (as is boto3, which was always the case).
I didn't really mean to merge PR #970 ; it somehow got lumped in with another one that merged so apologies for that. But it seems most here are in favor of the change so I'll leave it as-is unless someone raises objections.
I think we've tackled the low-hanging fruit here w.r.t dependency groups, so I'll consider this issue closed.