boa icon indicating copy to clipboard operation
boa copied to clipboard

[recipe-spec] Allow for arbitrary optional dependencies

Open dhirschfeld opened this issue 3 years ago • 12 comments

Proposal:

Have a single top-level requirements section as there is currently with required keys host and run and then optionally allow any other keys to specify named dependencies.

A requirements.build key would then be handled the same as currently but test dependencies would now be included as an optional requirements.test key. The specification of test dependencies under requirements.test would then be consistent with the specification of other dependencies under requirements (e.g. requirements.build rather than build.requirements)

In this way mamba/boa could provide support for the extras_require capability in setuptools/pip which is very widely used, and very useful.

Similar to setuptools, having a special-cased test requirements section could be removed in favour of treating them like any other optional dependency: image

To get the benefit of this new capability would require all the requirements to be saved in the package metadata and for mamba to allow users to install any named dependencies in the package metadata.

Currently, it's very awkward for users to create either a build or test environment as there's no way to tell mamba to create an environment with those specific deps.

The current recommendation to instead create separate outputs for any optional dependencies is awkward to specify and clutters up package repositories with multiple metadata-only packages for every single build.

dhirschfeld avatar Feb 17 '22 11:02 dhirschfeld

This is mostly a re-hashing of my arguments in the original hackmd document as a GitHub issue may provide a better forum for feedback and discussion.

dhirschfeld avatar Feb 17 '22 11:02 dhirschfeld

...and having just written this I now see https://github.com/conda/ceps/pull/9

dhirschfeld avatar Feb 17 '22 11:02 dhirschfeld

Example from dask:

extras_require: dict[str, list[str]] = {
    "array": ["numpy >= 1.18"],
    "bag": [],  # keeping for backwards compatibility
    "dataframe": ["numpy >= 1.18", "pandas >= 1.0"],
    "distributed": ["distributed == 2022.02.0"],
    "diagnostics": [
        "bokeh >= 2.1.1",
        "jinja2",
    ],
    "delayed": [],  # keeping for backwards compatibility
}
extras_require["complete"] = sorted({v for req in extras_require.values() for v in req})
# after complete is set, add in test
extras_require["test"] = [
    "pytest",
    "pytest-rerunfailures",
    "pytest-xdist",
    "pre-commit",
]

install_requires = [
    "cloudpickle >= 1.1.1",
    "fsspec >= 0.6.0",
    "packaging >= 20.0",
    "partd >= 0.3.10",
    "pyyaml >= 5.3.1",
    "toolz >= 0.8.2",
]
requirements:
  host:
    - python >=3.7
  run:
    - python >=3.7
    - cloudpickle >=1.1.1
    - fsspec >=0.6.0
    - packaging >=20.0
    - partd >=0.3.10
    - pyyaml >=5.3.1
    - toolz >=0.8.2
  run_constrained:
    - openssl !=1.1.1e
  array:
    - numpy >=1.18
  complete:
    - dask[array]
    - dask[dataframe]
    - dask[diagnostics]
    - dask[distributed]
  dataframe:
    - dask[array]
    - pandas >=1.0
  diagnostics:
    - bokeh >=2.1.1
    - jinja2
  distributed:
    - distributed ==2022.02.0
  test:
    - pytest
    - pytest-rerunfailures
    - pytest-xdist
    - pre-commit

dhirschfeld avatar Feb 17 '22 11:02 dhirschfeld

Hey, I think we could quite easily create additional outputs for a given recipe inside of boa that reference the top-level recipe using pin_subpackage(exact=true) automatically. It would be a cool exercise! For the final package name we could either use a valid syntax for the package name like dask[array] maps to dask-array, or if we want to make sure to not duplicate names we could go for dask__array__ or something crazy like that. Or we could allow [ in the package name (although I don't know if that maps well on all operating systems to file & folder names).

I am not sure if I would want to have this as additional top-level requirements keys or if we should put it as a next level under the run key.

wolfv avatar Feb 17 '22 12:02 wolfv

Regarding moving the test requirements under the requirements.test (instead of test.requires) is something I thought about today and maybe that's the first step into this direction, actually.

wolfv avatar Feb 17 '22 12:02 wolfv

Example from dask:

I find the proposed recipe-syntax (same level as build/host/run) confusing and otherwise less-than-ideal. Could this not be in a dedicated requirements-section like extensions:?

h-vetinari avatar Feb 18 '22 03:02 h-vetinari

So, I've been thinking about this a little bit more and even today it's already possible to quite easily create a couple of subpackages using explicit outputs and run requirements:

context:
  name: dask
  version: 0.1.0

package:
  name: '{{ name|lower }}'
  version: '{{ version }}'

build:
  number: 0

outputs:
  - package:
      name: dask
    requirements:
      host:
        - python
      run:
        - python
  - package:
      name: dask-array
    requirements:
      run:
        - "{{ pin_subpackage('dask', exact=True) }}"
        - numpy >=1.14
  - package:
      name: dask-dataframe
    requirements:
      run:
        - "{{ pin_subpackage('dask-array', exact=True) }}"
        - pandas

The way boa works it's extremely fast to produce the additional two outputs (basically takes no time) because of some architectural changes in boa vs. conda-build.

If we'd come up with a more concise syntax it would anyways be pre-processed into the above syntax, essentially. The magic pieces would be to automatically add the pin_subpackage exact and expand the names from a shorter form.

I think right now the bracket syntax won't work in conda/mamba/boa. If we specify that we allow the bracket syntax in repodata.json / packages etc. that would probably the only thing that we really need as a CEP.

Mamba should have additional logic to translate dask[array, dataframe] to dask[array] dask[dataframe].

**All outputs from boa with resolved envs**
                                Output: dask 0.1.0 BN: 0
                                        Variant:
                                         Build:

                  ╷                     ╷           ╷                    ╷
  Dependency      │ Version requirement │ Selected  │ Build              │ Channel
 ═════════════════╪═════════════════════╪═══════════╪════════════════════╪═════════════
                  │                     │           │                    │
  Host            │                     │           │                    │
  python          │ 3.9.*               │ 3.9.10    │ h38ef502_2_cpython │ conda-forge
  ncurses         │                     │ 6.3       │ hc470f4d_0         │ conda-forge
  xz              │                     │ 5.2.5     │ h642e427_1         │ conda-forge
  libzlib         │                     │ 1.2.11    │ hee7b306_1013      │ conda-forge
  libffi          │                     │ 3.4.2     │ h3422bc3_5         │ conda-forge
  ca-certificates │                     │ 2021.10.8 │ h4653dfc_0         │ conda-forge
  readline        │                     │ 8.1       │ hedafd6a_0         │ conda-forge
  tk              │                     │ 8.6.12    │ he1e0b03_0         │ conda-forge
  zlib            │                     │ 1.2.11    │ hee7b306_1013      │ conda-forge
  openssl         │                     │ 3.0.0     │ h3422bc3_2         │ conda-forge
  sqlite          │                     │ 3.37.0    │ h72a2b83_0         │ conda-forge
  tzdata          │                     │ 2021e     │ he74cb21_0         │ conda-forge
  bzip2           │                     │ 1.0.8     │ h4cc8a5f_0         │ local
  python_abi      │                     │ 3.9       │ 2_cp39             │ conda-forge
  setuptools      │                     │ 60.9.3    │ py39h2804cbe_0     │ conda-forge
  wheel           │                     │ 0.37.1    │ pyhd8ed1ab_0       │ conda-forge
  pip             │                     │ 22.0.3    │ pyhd8ed1ab_0       │ conda-forge
                  │                     │           │                    │
  Run             │                     │           │                    │
  python          │                     │ 3.9.10    │ h38ef502_2_cpython │ conda-forge
  python_abi      │ 3.9.* *_cp39        │ 3.9       │ 2_cp39             │ conda-forge
  ncurses         │                     │ 6.3       │ hc470f4d_0         │ conda-forge
  xz              │                     │ 5.2.5     │ h642e427_1         │ conda-forge
  libzlib         │                     │ 1.2.11    │ hee7b306_1013      │ conda-forge
  libffi          │                     │ 3.4.2     │ h3422bc3_5         │ conda-forge
  ca-certificates │                     │ 2021.10.8 │ h4653dfc_0         │ conda-forge
  readline        │                     │ 8.1       │ hedafd6a_0         │ conda-forge
  tk              │                     │ 8.6.12    │ he1e0b03_0         │ conda-forge
  zlib            │                     │ 1.2.11    │ hee7b306_1013      │ conda-forge
  openssl         │                     │ 3.0.0     │ h3422bc3_2         │ conda-forge
  sqlite          │                     │ 3.37.0    │ h72a2b83_0         │ conda-forge
  tzdata          │                     │ 2021e     │ he74cb21_0         │ conda-forge
  bzip2           │                     │ 1.0.8     │ h4cc8a5f_0         │ local
  setuptools      │                     │ 60.9.3    │ py39h2804cbe_0     │ conda-forge
  wheel           │                     │ 0.37.1    │ pyhd8ed1ab_0       │ conda-forge
  pip             │                     │ 22.0.3    │ pyhd8ed1ab_0       │ conda-forge
                  ╵                     ╵           ╵                    ╵



                                 Output: dask-array 0.1.0 BN: 0
                                            Variant:
                                             Build:

                  ╷                         ╷             ╷                      ╷
  Dependency      │ Version requirement     │ Selected    │ Build                │ Channel
 ═════════════════╪═════════════════════════╪═════════════╪══════════════════════╪═════════════
                  │                         │             │                      │
  Run             │                         │             │                      │
  dask            │ PS 0.1.0 py39h687aae2_0 │ 0.1.0       │ py39h687aae2_0       │ local
  numpy           │ >=1.14                  │ 1.22.2      │ py39h61a45d2_0       │ conda-forge
  llvm-openmp     │                         │ 13.0.1      │ hf3c4609_0           │ conda-forge
  libcxx          │                         │ 12.0.1      │ h168391b_1           │ conda-forge
  ncurses         │                         │ 6.3         │ hc470f4d_0           │ conda-forge
  xz              │                         │ 5.2.5       │ h642e427_1           │ conda-forge
  libzlib         │                         │ 1.2.11      │ hee7b306_1013        │ conda-forge
  libffi          │                         │ 3.4.2       │ h3422bc3_5           │ conda-forge
  ca-certificates │                         │ 2021.10.8   │ h4653dfc_0           │ conda-forge
  libgfortran5    │                         │ 11.0.1.dev0 │ hf114ba7_23          │ conda-forge
  readline        │                         │ 8.1         │ hedafd6a_0           │ conda-forge
  tk              │                         │ 8.6.12      │ he1e0b03_0           │ conda-forge
  zlib            │                         │ 1.2.11      │ hee7b306_1013        │ conda-forge
  openssl         │                         │ 3.0.0       │ h3422bc3_2           │ conda-forge
  libgfortran     │                         │ 5.0.0.dev0  │ 11_0_1_hf114ba7_23   │ conda-forge
  sqlite          │                         │ 3.37.0      │ h72a2b83_0           │ conda-forge
  libopenblas     │                         │ 0.3.18      │ openmp_h5dd58f0_0    │ conda-forge
  libblas         │                         │ 3.9.0       │ 13_osxarm64_openblas │ conda-forge
  libcblas        │                         │ 3.9.0       │ 13_osxarm64_openblas │ conda-forge
  liblapack       │                         │ 3.9.0       │ 13_osxarm64_openblas │ conda-forge
  bzip2           │                         │ 1.0.8       │ h4cc8a5f_0           │ local
  tzdata          │                         │ 2021e       │ he74cb21_0           │ conda-forge
  python          │                         │ 3.9.10      │ h38ef502_2_cpython   │ conda-forge
  python_abi      │                         │ 3.9         │ 2_cp39               │ conda-forge
  setuptools      │                         │ 60.9.3      │ py39h2804cbe_0       │ conda-forge
  wheel           │                         │ 0.37.1      │ pyhd8ed1ab_0         │ conda-forge
  pip             │                         │ 22.0.3      │ pyhd8ed1ab_0         │ conda-forge
                  ╵                         ╵             ╵                      ╵



                             Output: dask-dataframe 0.1.0 BN: 0
                                          Variant:
                                           Build:

                  ╷                     ╷             ╷                      ╷
  Dependency      │ Version requirement │ Selected    │ Build                │ Channel
 ═════════════════╪═════════════════════╪═════════════╪══════════════════════╪═════════════
                  │                     │             │                      │
  Run             │                     │             │                      │
  dask-array      │ PS 0.1.0 h60d57d3_0 │ 0.1.0       │ h60d57d3_0           │ local
  pandas          │                     │ 1.4.1       │ py39h7f752ed_0       │ conda-forge
  libcxx          │                     │ 12.0.1      │ h168391b_1           │ conda-forge
  llvm-openmp     │                     │ 13.0.1      │ hf3c4609_0           │ conda-forge
  ncurses         │                     │ 6.3         │ hc470f4d_0           │ conda-forge
  xz              │                     │ 5.2.5       │ h642e427_1           │ conda-forge
  libzlib         │                     │ 1.2.11      │ hee7b306_1013        │ conda-forge
  libffi          │                     │ 3.4.2       │ h3422bc3_5           │ conda-forge
  ca-certificates │                     │ 2021.10.8   │ h4653dfc_0           │ conda-forge
  libgfortran5    │                     │ 11.0.1.dev0 │ hf114ba7_23          │ conda-forge
  readline        │                     │ 8.1         │ hedafd6a_0           │ conda-forge
  tk              │                     │ 8.6.12      │ he1e0b03_0           │ conda-forge
  zlib            │                     │ 1.2.11      │ hee7b306_1013        │ conda-forge
  openssl         │                     │ 3.0.0       │ h3422bc3_2           │ conda-forge
  libgfortran     │                     │ 5.0.0.dev0  │ 11_0_1_hf114ba7_23   │ conda-forge
  sqlite          │                     │ 3.37.0      │ h72a2b83_0           │ conda-forge
  libopenblas     │                     │ 0.3.18      │ openmp_h5dd58f0_0    │ conda-forge
  libblas         │                     │ 3.9.0       │ 13_osxarm64_openblas │ conda-forge
  libcblas        │                     │ 3.9.0       │ 13_osxarm64_openblas │ conda-forge
  liblapack       │                     │ 3.9.0       │ 13_osxarm64_openblas │ conda-forge
  bzip2           │                     │ 1.0.8       │ h4cc8a5f_0           │ local
  tzdata          │                     │ 2021e       │ he74cb21_0           │ conda-forge
  python          │                     │ 3.9.10      │ h38ef502_2_cpython   │ conda-forge
  python_abi      │                     │ 3.9         │ 2_cp39               │ conda-forge
  setuptools      │                     │ 60.9.3      │ py39h2804cbe_0       │ conda-forge
  wheel           │                     │ 0.37.1      │ pyhd8ed1ab_0         │ conda-forge
  pip             │                     │ 22.0.3      │ pyhd8ed1ab_0         │ conda-forge
  six             │                     │ 1.16.0      │ pyh6c4a22f_0         │ conda-forge
  pytz            │                     │ 2021.3      │ pyhd8ed1ab_0         │ conda-forge
  python-dateutil │                     │ 2.8.2       │ pyhd8ed1ab_0         │ conda-forge
  numpy           │                     │ 1.22.2      │ py39h61a45d2_0       │ conda-forge
  dask            │                     │ 0.1.0       │ py39h687aae2_0       │ local
                  ╵                     ╵             ╵                      ╵

wolfv avatar Feb 23 '22 07:02 wolfv

I see creating extra packages as more of a work-around. There's a lot more boilerplate and it forces the user to deal with (publish) multiple binary artefacts. IMO, it would be preferable to have a single package that stored all dependencies, optional or not, as metadata.

With the multiple packages, it again special-cases certain deps e.g. test, run and build deps.

If I want to create an environment capable of running a package without installing the package itself you have the special-case --only-deps flag but AFAIK there's no special-case flags to install either the test or build deps.

What do you do if you want to create an environment (outside of conda/boa build) to test the package? If there were a general solution for installing any deps this would be a solved problem.

My own current solution to that problem is to specify all deps other than build/host/run externally to the recipe, as requirements-*.txt files which our CI then installs with mamba install --file ...

It would be nicer if there were some syntax, possibly similar to below where you could install any set of named dependencies:

mamba install mypkg  # installs mypkg and the run deps
mamba install mypkg[run]
mamba install mypkg[test]
mamba install mypkg[build]
mamba install mypkg[docs]
mamba install mypkg[dev]
mamba install mypkk[db.pgsql]
...etc...

Yes, it's a lot more work than just leveraging existing functionality through multiple outputs, but maybe the cleaner syntax and UX (not having to deal with multiple artefacts) justify making the change.

dhirschfeld avatar Feb 23 '22 09:02 dhirschfeld

I think the multi-packages / multi-output way would be by far more backward compatible.

I see your point -- if we consider having many (empty) packages to be expensive. Let's let this ferment a bit.

wolfv avatar Feb 23 '22 15:02 wolfv

I think the multi-packages / multi-output way would be by far more backward compatible.

I am suggesting we break backward compat here, but it might just be the time to do so. Once you already have to make some syntax changes, I think the inconvenience from making bigger changes is only marginal. Others may well have different tolerances for breakage though! 😆

Also, automated tooling such as souschef or grayskull should hopefully be able to help regardless of the final schema chosen.

Let's let this ferment a bit

Yep, it's definitely something that would benefit from consideration by the wider community! :+1:

dhirschfeld avatar Feb 23 '22 23:02 dhirschfeld

I'm happy to break backward compatibility of the recipe but with the multiple outputs it would continue to work with regular conda/mamba.

If we want to follow your suggestion (single package with multiple optional dependency sections) we need different repodata.json if i am not mistaken.

wolfv avatar Feb 24 '22 00:02 wolfv

we need different repodata.json if i am not mistaken.

Yeah, I think it will require a change to the repodata.json schema which would likely make it uninstallable by conda (unless you published 2 versions I guess)

dhirschfeld avatar Feb 24 '22 00:02 dhirschfeld