vrs-python
vrs-python copied to clipboard
Eliminate "extra" dependencies
I'm not sure if a firm decision has been made on this but I've heard a few thoughts expressed on the matter:
- Rename "extras" to something more expressive ("tools" or something?)
- Merge the extra dependencies into the main dependencies
- Separate the models/serialization business and the extra tooling into separately installable packages ("core" vs "tools" ?)
edit: had another thought to toss out: move extras
to a different module level, a la
src
└── ga4gh
├── core
├── vrs
└── extras. # or "tools" or w/e
├── __init__.py
├── decorators.py
├── object_store.py
├── translator.py
└── vcf_annotation.py
└── utils
└── hgvs_tools.py
@ga4gh/vrs-python-maintainers @ga4gh/vr-maintainers should provide thoughts
- Rename "extras" to something more expressive ("tools" or something?)
I am pro more meaningful name. Tools is fine to me
- Merge the extra dependencies into the main dependencies
- Separate the models/serialization business and the extra tooling into separately installable packages ("core" vs "tools" ?)
I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight
I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight
Yeah I agree that simply merging the dependencies would cause some unnecessary difficulties for some niche but not infrequent use cases. I guess the question is
-
if we keep in same installable, is it weird to have the requirements for the most common use cases live under an optional dependency group?
-
if we separate out into different packages, will it suck to maintain a models/core/serialization package separately from a tools (translator, annotator) package?
final note: I noticed a few days ago that NetworkX is doing something like this -- you can theoretically install it without Numpy, etc but will hit import errors very quickly, so they recommend you just install with the optional [default]
dependency group: https://networkx.org/documentation/stable/install.html#install-the-released-version
I think it's okay to have a frequently used optional dependencies target. The core library functionality of the models and serialization stuff should be usable without those. Not being able to normalize alleles without it is a bit of a limitation.
If there wasn't such a size difference in the installed size between them I'd say it wouldn't be that big of a deal to merge them. Adding the extras
adds a lot (though I would expect most users to add them).
docker run -it --rm python:3.11 bash -c 'du -hs /usr/local/lib/python3.11/site-packages'
13M /usr/local/lib/python3.11/site-packages
docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'
34M /usr/local/lib/python3.11/site-packages
docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs[extras] @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'
182M /usr/local/lib/python3.11/site-packages
@theferrit32 ~40MB of that is PySam (by way of HGVS -> SeqRepo -> PySam), which maybe explains why the DataProxy
stuff is reimplemented here instead of being imported from SeqRepo...