boa
boa copied to clipboard
[recipe-spec] Allow for arbitrary optional dependencies
Proposal:
Have a single top-level requirements section as there is currently with required keys host and run and then optionally allow any other keys to specify named dependencies.
A requirements.build key would then be handled the same as currently but test dependencies would now be included as an optional requirements.test key. The specification of test dependencies under requirements.test would then be consistent with the specification of other dependencies under requirements (e.g. requirements.build rather than build.requirements)
In this way mamba/boa could provide support for the extras_require capability in setuptools/pip which is very widely used, and very useful.
Similar to setuptools, having a special-cased test requirements section could be removed in favour of treating them like any other optional dependency:

To get the benefit of this new capability would require all the requirements to be saved in the package metadata and for mamba to allow users to install any named dependencies in the package metadata.
Currently, it's very awkward for users to create either a build or test environment as there's no way to tell mamba to create an environment with those specific deps.
The current recommendation to instead create separate outputs for any optional dependencies is awkward to specify and clutters up package repositories with multiple metadata-only packages for every single build.
This is mostly a re-hashing of my arguments in the original hackmd document as a GitHub issue may provide a better forum for feedback and discussion.
...and having just written this I now see https://github.com/conda/ceps/pull/9
Example from dask:
extras_require: dict[str, list[str]] = {
"array": ["numpy >= 1.18"],
"bag": [], # keeping for backwards compatibility
"dataframe": ["numpy >= 1.18", "pandas >= 1.0"],
"distributed": ["distributed == 2022.02.0"],
"diagnostics": [
"bokeh >= 2.1.1",
"jinja2",
],
"delayed": [], # keeping for backwards compatibility
}
extras_require["complete"] = sorted({v for req in extras_require.values() for v in req})
# after complete is set, add in test
extras_require["test"] = [
"pytest",
"pytest-rerunfailures",
"pytest-xdist",
"pre-commit",
]
install_requires = [
"cloudpickle >= 1.1.1",
"fsspec >= 0.6.0",
"packaging >= 20.0",
"partd >= 0.3.10",
"pyyaml >= 5.3.1",
"toolz >= 0.8.2",
]
requirements:
host:
- python >=3.7
run:
- python >=3.7
- cloudpickle >=1.1.1
- fsspec >=0.6.0
- packaging >=20.0
- partd >=0.3.10
- pyyaml >=5.3.1
- toolz >=0.8.2
run_constrained:
- openssl !=1.1.1e
array:
- numpy >=1.18
complete:
- dask[array]
- dask[dataframe]
- dask[diagnostics]
- dask[distributed]
dataframe:
- dask[array]
- pandas >=1.0
diagnostics:
- bokeh >=2.1.1
- jinja2
distributed:
- distributed ==2022.02.0
test:
- pytest
- pytest-rerunfailures
- pytest-xdist
- pre-commit
Hey, I think we could quite easily create additional outputs for a given recipe inside of boa that reference the top-level recipe using pin_subpackage(exact=true) automatically. It would be a cool exercise!
For the final package name we could either use a valid syntax for the package name like dask[array] maps to dask-array, or if we want to make sure to not duplicate names we could go for dask__array__ or something crazy like that. Or we could allow [ in the package name (although I don't know if that maps well on all operating systems to file & folder names).
I am not sure if I would want to have this as additional top-level requirements keys or if we should put it as a next level under the run key.
Regarding moving the test requirements under the requirements.test (instead of test.requires) is something I thought about today and maybe that's the first step into this direction, actually.
Example from
dask:
I find the proposed recipe-syntax (same level as build/host/run) confusing and otherwise less-than-ideal. Could this not be in a dedicated requirements-section like extensions:?
So, I've been thinking about this a little bit more and even today it's already possible to quite easily create a couple of subpackages using explicit outputs and run requirements:
context:
name: dask
version: 0.1.0
package:
name: '{{ name|lower }}'
version: '{{ version }}'
build:
number: 0
outputs:
- package:
name: dask
requirements:
host:
- python
run:
- python
- package:
name: dask-array
requirements:
run:
- "{{ pin_subpackage('dask', exact=True) }}"
- numpy >=1.14
- package:
name: dask-dataframe
requirements:
run:
- "{{ pin_subpackage('dask-array', exact=True) }}"
- pandas
The way boa works it's extremely fast to produce the additional two outputs (basically takes no time) because of some architectural changes in boa vs. conda-build.
If we'd come up with a more concise syntax it would anyways be pre-processed into the above syntax, essentially. The magic pieces would be to automatically add the pin_subpackage exact and expand the names from a shorter form.
I think right now the bracket syntax won't work in conda/mamba/boa. If we specify that we allow the bracket syntax in repodata.json / packages etc. that would probably the only thing that we really need as a CEP.
Mamba should have additional logic to translate dask[array, dataframe] to dask[array] dask[dataframe].
**All outputs from boa with resolved envs**
Output: dask 0.1.0 BN: 0
Variant:
Build:
╷ ╷ ╷ ╷
Dependency │ Version requirement │ Selected │ Build │ Channel
═════════════════╪═════════════════════╪═══════════╪════════════════════╪═════════════
│ │ │ │
Host │ │ │ │
python │ 3.9.* │ 3.9.10 │ h38ef502_2_cpython │ conda-forge
ncurses │ │ 6.3 │ hc470f4d_0 │ conda-forge
xz │ │ 5.2.5 │ h642e427_1 │ conda-forge
libzlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
libffi │ │ 3.4.2 │ h3422bc3_5 │ conda-forge
ca-certificates │ │ 2021.10.8 │ h4653dfc_0 │ conda-forge
readline │ │ 8.1 │ hedafd6a_0 │ conda-forge
tk │ │ 8.6.12 │ he1e0b03_0 │ conda-forge
zlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
openssl │ │ 3.0.0 │ h3422bc3_2 │ conda-forge
sqlite │ │ 3.37.0 │ h72a2b83_0 │ conda-forge
tzdata │ │ 2021e │ he74cb21_0 │ conda-forge
bzip2 │ │ 1.0.8 │ h4cc8a5f_0 │ local
python_abi │ │ 3.9 │ 2_cp39 │ conda-forge
setuptools │ │ 60.9.3 │ py39h2804cbe_0 │ conda-forge
wheel │ │ 0.37.1 │ pyhd8ed1ab_0 │ conda-forge
pip │ │ 22.0.3 │ pyhd8ed1ab_0 │ conda-forge
│ │ │ │
Run │ │ │ │
python │ │ 3.9.10 │ h38ef502_2_cpython │ conda-forge
python_abi │ 3.9.* *_cp39 │ 3.9 │ 2_cp39 │ conda-forge
ncurses │ │ 6.3 │ hc470f4d_0 │ conda-forge
xz │ │ 5.2.5 │ h642e427_1 │ conda-forge
libzlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
libffi │ │ 3.4.2 │ h3422bc3_5 │ conda-forge
ca-certificates │ │ 2021.10.8 │ h4653dfc_0 │ conda-forge
readline │ │ 8.1 │ hedafd6a_0 │ conda-forge
tk │ │ 8.6.12 │ he1e0b03_0 │ conda-forge
zlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
openssl │ │ 3.0.0 │ h3422bc3_2 │ conda-forge
sqlite │ │ 3.37.0 │ h72a2b83_0 │ conda-forge
tzdata │ │ 2021e │ he74cb21_0 │ conda-forge
bzip2 │ │ 1.0.8 │ h4cc8a5f_0 │ local
setuptools │ │ 60.9.3 │ py39h2804cbe_0 │ conda-forge
wheel │ │ 0.37.1 │ pyhd8ed1ab_0 │ conda-forge
pip │ │ 22.0.3 │ pyhd8ed1ab_0 │ conda-forge
╵ ╵ ╵ ╵
Output: dask-array 0.1.0 BN: 0
Variant:
Build:
╷ ╷ ╷ ╷
Dependency │ Version requirement │ Selected │ Build │ Channel
═════════════════╪═════════════════════════╪═════════════╪══════════════════════╪═════════════
│ │ │ │
Run │ │ │ │
dask │ PS 0.1.0 py39h687aae2_0 │ 0.1.0 │ py39h687aae2_0 │ local
numpy │ >=1.14 │ 1.22.2 │ py39h61a45d2_0 │ conda-forge
llvm-openmp │ │ 13.0.1 │ hf3c4609_0 │ conda-forge
libcxx │ │ 12.0.1 │ h168391b_1 │ conda-forge
ncurses │ │ 6.3 │ hc470f4d_0 │ conda-forge
xz │ │ 5.2.5 │ h642e427_1 │ conda-forge
libzlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
libffi │ │ 3.4.2 │ h3422bc3_5 │ conda-forge
ca-certificates │ │ 2021.10.8 │ h4653dfc_0 │ conda-forge
libgfortran5 │ │ 11.0.1.dev0 │ hf114ba7_23 │ conda-forge
readline │ │ 8.1 │ hedafd6a_0 │ conda-forge
tk │ │ 8.6.12 │ he1e0b03_0 │ conda-forge
zlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
openssl │ │ 3.0.0 │ h3422bc3_2 │ conda-forge
libgfortran │ │ 5.0.0.dev0 │ 11_0_1_hf114ba7_23 │ conda-forge
sqlite │ │ 3.37.0 │ h72a2b83_0 │ conda-forge
libopenblas │ │ 0.3.18 │ openmp_h5dd58f0_0 │ conda-forge
libblas │ │ 3.9.0 │ 13_osxarm64_openblas │ conda-forge
libcblas │ │ 3.9.0 │ 13_osxarm64_openblas │ conda-forge
liblapack │ │ 3.9.0 │ 13_osxarm64_openblas │ conda-forge
bzip2 │ │ 1.0.8 │ h4cc8a5f_0 │ local
tzdata │ │ 2021e │ he74cb21_0 │ conda-forge
python │ │ 3.9.10 │ h38ef502_2_cpython │ conda-forge
python_abi │ │ 3.9 │ 2_cp39 │ conda-forge
setuptools │ │ 60.9.3 │ py39h2804cbe_0 │ conda-forge
wheel │ │ 0.37.1 │ pyhd8ed1ab_0 │ conda-forge
pip │ │ 22.0.3 │ pyhd8ed1ab_0 │ conda-forge
╵ ╵ ╵ ╵
Output: dask-dataframe 0.1.0 BN: 0
Variant:
Build:
╷ ╷ ╷ ╷
Dependency │ Version requirement │ Selected │ Build │ Channel
═════════════════╪═════════════════════╪═════════════╪══════════════════════╪═════════════
│ │ │ │
Run │ │ │ │
dask-array │ PS 0.1.0 h60d57d3_0 │ 0.1.0 │ h60d57d3_0 │ local
pandas │ │ 1.4.1 │ py39h7f752ed_0 │ conda-forge
libcxx │ │ 12.0.1 │ h168391b_1 │ conda-forge
llvm-openmp │ │ 13.0.1 │ hf3c4609_0 │ conda-forge
ncurses │ │ 6.3 │ hc470f4d_0 │ conda-forge
xz │ │ 5.2.5 │ h642e427_1 │ conda-forge
libzlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
libffi │ │ 3.4.2 │ h3422bc3_5 │ conda-forge
ca-certificates │ │ 2021.10.8 │ h4653dfc_0 │ conda-forge
libgfortran5 │ │ 11.0.1.dev0 │ hf114ba7_23 │ conda-forge
readline │ │ 8.1 │ hedafd6a_0 │ conda-forge
tk │ │ 8.6.12 │ he1e0b03_0 │ conda-forge
zlib │ │ 1.2.11 │ hee7b306_1013 │ conda-forge
openssl │ │ 3.0.0 │ h3422bc3_2 │ conda-forge
libgfortran │ │ 5.0.0.dev0 │ 11_0_1_hf114ba7_23 │ conda-forge
sqlite │ │ 3.37.0 │ h72a2b83_0 │ conda-forge
libopenblas │ │ 0.3.18 │ openmp_h5dd58f0_0 │ conda-forge
libblas │ │ 3.9.0 │ 13_osxarm64_openblas │ conda-forge
libcblas │ │ 3.9.0 │ 13_osxarm64_openblas │ conda-forge
liblapack │ │ 3.9.0 │ 13_osxarm64_openblas │ conda-forge
bzip2 │ │ 1.0.8 │ h4cc8a5f_0 │ local
tzdata │ │ 2021e │ he74cb21_0 │ conda-forge
python │ │ 3.9.10 │ h38ef502_2_cpython │ conda-forge
python_abi │ │ 3.9 │ 2_cp39 │ conda-forge
setuptools │ │ 60.9.3 │ py39h2804cbe_0 │ conda-forge
wheel │ │ 0.37.1 │ pyhd8ed1ab_0 │ conda-forge
pip │ │ 22.0.3 │ pyhd8ed1ab_0 │ conda-forge
six │ │ 1.16.0 │ pyh6c4a22f_0 │ conda-forge
pytz │ │ 2021.3 │ pyhd8ed1ab_0 │ conda-forge
python-dateutil │ │ 2.8.2 │ pyhd8ed1ab_0 │ conda-forge
numpy │ │ 1.22.2 │ py39h61a45d2_0 │ conda-forge
dask │ │ 0.1.0 │ py39h687aae2_0 │ local
╵ ╵ ╵ ╵
I see creating extra packages as more of a work-around. There's a lot more boilerplate and it forces the user to deal with (publish) multiple binary artefacts. IMO, it would be preferable to have a single package that stored all dependencies, optional or not, as metadata.
With the multiple packages, it again special-cases certain deps e.g. test, run and build deps.
If I want to create an environment capable of running a package without installing the package itself you have the special-case --only-deps flag but AFAIK there's no special-case flags to install either the test or build deps.
What do you do if you want to create an environment (outside of conda/boa build) to test the package? If there were a general solution for installing any deps this would be a solved problem.
My own current solution to that problem is to specify all deps other than build/host/run externally to the recipe, as requirements-*.txt files which our CI then installs with mamba install --file ...
It would be nicer if there were some syntax, possibly similar to below where you could install any set of named dependencies:
mamba install mypkg # installs mypkg and the run deps
mamba install mypkg[run]
mamba install mypkg[test]
mamba install mypkg[build]
mamba install mypkg[docs]
mamba install mypkg[dev]
mamba install mypkk[db.pgsql]
...etc...
Yes, it's a lot more work than just leveraging existing functionality through multiple outputs, but maybe the cleaner syntax and UX (not having to deal with multiple artefacts) justify making the change.
I think the multi-packages / multi-output way would be by far more backward compatible.
I see your point -- if we consider having many (empty) packages to be expensive. Let's let this ferment a bit.
I think the multi-packages / multi-output way would be by far more backward compatible.
I am suggesting we break backward compat here, but it might just be the time to do so. Once you already have to make some syntax changes, I think the inconvenience from making bigger changes is only marginal. Others may well have different tolerances for breakage though! 😆
Also, automated tooling such as souschef or grayskull should hopefully be able to help regardless of the final schema chosen.
Let's let this ferment a bit
Yep, it's definitely something that would benefit from consideration by the wider community! :+1:
I'm happy to break backward compatibility of the recipe but with the multiple outputs it would continue to work with regular conda/mamba.
If we want to follow your suggestion (single package with multiple optional dependency sections) we need different repodata.json if i am not mistaken.
we need different repodata.json if i am not mistaken.
Yeah, I think it will require a change to the repodata.json schema which would likely make it uninstallable by conda (unless you published 2 versions I guess)