vrs-python icon indicating copy to clipboard operation
vrs-python copied to clipboard

Eliminate "extra" dependencies

Open jsstevenson opened this issue 10 months ago • 5 comments

I'm not sure if a firm decision has been made on this but I've heard a few thoughts expressed on the matter:

  • Rename "extras" to something more expressive ("tools" or something?)
  • Merge the extra dependencies into the main dependencies
  • Separate the models/serialization business and the extra tooling into separately installable packages ("core" vs "tools" ?)

edit: had another thought to toss out: move extras to a different module level, a la

src
└── ga4gh
    ├── core
    ├── vrs    
    └── extras. # or "tools" or w/e
        ├── __init__.py
        ├── decorators.py
        ├── object_store.py
        ├── translator.py
        └── vcf_annotation.py
        └── utils
            └── hgvs_tools.py

jsstevenson avatar Apr 10 '24 12:04 jsstevenson

@ga4gh/vrs-python-maintainers @ga4gh/vr-maintainers should provide thoughts

korikuzma avatar Apr 10 '24 12:04 korikuzma

  • Rename "extras" to something more expressive ("tools" or something?)

I am pro more meaningful name. Tools is fine to me

  • Merge the extra dependencies into the main dependencies
  • Separate the models/serialization business and the extra tooling into separately installable packages ("core" vs "tools" ?)

I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight

korikuzma avatar Apr 10 '24 13:04 korikuzma

I would prefer to keep them separate since in gene-normalizer, we only use the Pydantic models and would like to keep deps lightweight

Yeah I agree that simply merging the dependencies would cause some unnecessary difficulties for some niche but not infrequent use cases. I guess the question is

  1. if we keep in same installable, is it weird to have the requirements for the most common use cases live under an optional dependency group?

  2. if we separate out into different packages, will it suck to maintain a models/core/serialization package separately from a tools (translator, annotator) package?

final note: I noticed a few days ago that NetworkX is doing something like this -- you can theoretically install it without Numpy, etc but will hit import errors very quickly, so they recommend you just install with the optional [default] dependency group: https://networkx.org/documentation/stable/install.html#install-the-released-version

jsstevenson avatar Apr 10 '24 14:04 jsstevenson

I think it's okay to have a frequently used optional dependencies target. The core library functionality of the models and serialization stuff should be usable without those. Not being able to normalize alleles without it is a bit of a limitation.

If there wasn't such a size difference in the installed size between them I'd say it wouldn't be that big of a deal to merge them. Adding the extras adds a lot (though I would expect most users to add them).

docker run -it --rm python:3.11 bash -c 'du -hs /usr/local/lib/python3.11/site-packages'

13M	/usr/local/lib/python3.11/site-packages


docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'

34M	/usr/local/lib/python3.11/site-packages


docker run -it --rm python:3.11 bash -c 'pip install "ga4gh.vrs[extras] @ git+https://github.com/ga4gh/vrs-python" && du -hs /usr/local/lib/python3.11/site-packages'

182M	/usr/local/lib/python3.11/site-packages

theferrit32 avatar Apr 11 '24 17:04 theferrit32

@theferrit32 ~40MB of that is PySam (by way of HGVS -> SeqRepo -> PySam), which maybe explains why the DataProxy stuff is reimplemented here instead of being imported from SeqRepo...

jsstevenson avatar May 10 '24 13:05 jsstevenson