conda-lock icon indicating copy to clipboard operation
conda-lock copied to clipboard

Pinning subset requirements

Open xhochy opened this issue 4 years ago • 20 comments

When developing, we will install much more dependencies at runtime. We still would like to pin the runtime and the development/CI dependencies to the exact same versions. This should result in two lockfiles where one is a subset of the other.

Given the following environment.yml:

name: nyc-taxi-fare-prediction-deployment-example
channels:
  - conda-forge
  - nodefaults
dependencies:
  - click
  - jupyterlab  # [dev]

and the following command: conda-lock -f environment.yml --subset "core|dev=0"

I would expect two files conda-linux-64.lock and conda-linux-64-core.lock where in the latter jupyterlab and its dependencies are omitted but all other packages are the same version.

Does this sound like a reasonable proposal? If so, I would start working on this.

xhochy avatar Feb 13 '21 21:02 xhochy

so the way we can do this atm is by passing multiple files using the compound specifications

conda-lock -f base.yml -f specific.yml -p linux-64 --filename-format "specific-{platform}.lock"

mariusvniekerk avatar Mar 04 '21 08:03 mariusvniekerk

in the general case the additional dependency can invalidate the entire solve though so not sure we can just have these be atomic

mariusvniekerk avatar Mar 04 '21 08:03 mariusvniekerk

Just a small comment to say that such a feature would be awesome to have on the conda and mamba side also when creating an env. See https://github.com/conda/conda/issues/10398 for something a bit different.

I really like this idea of selectors to specify subset env.

hadim avatar Mar 19 '21 01:03 hadim

Rather than merging the specifications successively, I'd expect that a more accurate approximation would be:

  1. First install all dependencies
  2. Export as conda-linux-64.lock
  3. Uninstall dev dependencies
  4. Export as conda-linux-64-core.lock

Is there any progress on this? Anything I could do to help?

maresb avatar Jul 05 '21 07:07 maresb

The most pressing thing where help is needed is to find a way how to specify this in a way that is compatible/accepted by the tools in the ecosystem (conda, mamba, conda-lock).

We don't need to "install" anything actually. The solver output of conda / mamba for the larger set of dependencies should be sufficient to already pin the subsets. You get a tree and you can leave out all branches that don't end up in one of the remaining directly listed dependencies.

xhochy avatar Jul 05 '21 07:07 xhochy

We don't need to "install" anything actually.

Right, I actually meant "install" in the sense of "ask the conda/mamba solver which packages would be required to install"

You get a tree and you can leave out all branches that don't end up in one of the remaining directly listed dependencies.

Is it really a tree? I assume nodes would be packages, but then what would be the unique parent function? In any case it looks like I'd need to invest some time into understanding the data structures.

@xhochy, did you ever start working on this? Any partial progress to share?

maresb avatar Jul 05 '21 07:07 maresb

@xhochy, did you ever start working on this? Any partial progress to share?

No, the important starting point for me would be to have consensus on the specification. Implementing itself is (from my perspective) a less complicated issue.

xhochy avatar Jul 05 '21 07:07 xhochy

By "specification" do you mean a way to indicate dev dependencies such as using # [dev] as a label? Or do you mean something else?

maresb avatar Jul 05 '21 08:07 maresb

By "specification" do you mean a way to indicate dev dependencies such as using # [dev] as a label?

Yes.

xhochy avatar Jul 05 '21 08:07 xhochy

I don't understand exactly what you mean by --subset "core|dev=0", but I'd trust your judgement, and it sounds like you have something clever in mind. I really like your proposal, and so do 7 others according to the :+1:'s. That looks quite positive to me.

What sort of consensus are you looking for exactly? I wonder if we could somehow push the issue?

maresb avatar Jul 05 '21 08:07 maresb

I don't understand exactly what you mean by --subset "core|dev=0"

This is the one thing I really don't like about my suggestion. It is hard to specify the subset on the CLI in an understandable manner. --subset "core|dev=0" boils down to:

  • --subset: In addition to a full lock, also pin for this sub selection of packages
  • core: Name that selection core
  • dev=0: The subset is selected by setting the environment marker dev to false.

Also, I had a discussion with @mariusvniekerk outside this issue where he expressed a certain dislike regarding selectors. @mariusvniekerk Can you maybe expand that here a bit?

xhochy avatar Jul 05 '21 18:07 xhochy

My general problem with selectors is that they are an abnormal way to add logic into yaml.

Additionally conda lock supports reading deps from a few more sources that exist outside of the conda ecosystem.

I'm perfectly fine with just sticking a few more keys into the yaml (and the pyproject.toml etc) and parsing it correctly.

Maybe something like

conda-lock:
   package-subsets:
       core: [foo] 

As a means of expressing named package subsets.

mariusvniekerk avatar Jul 07 '21 22:07 mariusvniekerk

@mariusvniekerk, to clarify, would your [foo] be a list of packages, each of which must also occur in dependencies?

maresb avatar Jul 07 '21 22:07 maresb

Yep those packages and obviously all their deps from the resulting graph

mariusvniekerk avatar Jul 08 '21 12:07 mariusvniekerk

Additionally a specification like this allows us to rather easily support it entirely on the command-line as well

conda-lock --package-subset core=foo,bar --package-subset test=foo,bar,pytest

mariusvniekerk avatar Jul 08 '21 15:07 mariusvniekerk

Yep those packages and obviously all their deps from the resulting graph

Sorry, I'm very confused by the above comment... I thought we're talking about the yaml input, which the human specifies. In Uwe's example: click,jupyterlab. Then we want some subset like core=click. But the human would never be expected to compute the dependency graph, right? I thought that's conda-lock's job. Or are you discussing the output format?

maresb avatar Jul 08 '21 16:07 maresb

I was searching for this exact feature and read about the solution pipenv is using for installation: a second file containing the development dependencies.

Maybe there is a blind spot to my solution, but what about leveraging the mechanism of conda-lock to read package names from multiple input files. Using this same mechanism to read a second file with development dependencies, where the file can have any of the supported formats. The command line for my suggestion could look like:

conda-lock -f environment.yml -d dev-environment.yaml

-d is the short form of --dev.

While writing, I realize the similarity to the --dev-dependencies option. So what about re-thinking its meaning? If no value is provided with --dev-dependencies, the dependencies are read from the input (status quo). However if given a list of strings, those shall be interpreted as list of development dependencies. If given a path to a file, the file is parsed for dependencies and those are used (like command line above). Therefore, the following command would also be valid:

conda-lock -f environment.yml -d=jupyterlab,pytest

wagnerpeer avatar Jul 16 '21 18:07 wagnerpeer

just chiming in here real quick -- (micro)mamba got a new feature that conda also has (afaik) which is --freeze-installed. that works with the SOLVER_LOCK of libsolv.

Might be useful somehow for this case: https://github.com/mamba-org/mamba/pull/1048/files

wolfv avatar Jul 22 '21 15:07 wolfv

@wagnerpeer conda-lock already supports reading from multiple files for its dependencies.

conda-lock -f environment.yml -f dev-environment.yaml

will already work.

The drawback there is that it is possible in this setup if you generate a lock for both the regular version and the version with additional packages to result in different solves for the shared packages due to changes in constraints imposed by the dev packages.


@wolfv it might be useful, main thing is we don't materialize any of the solves here and atm conda-lock still has to run on stock conda, but we could potentially require mamba for these features.

I think for solvability this has to work from the superset backwards instead of growing a dependency set.

mariusvniekerk avatar Jul 22 '21 19:07 mariusvniekerk

@mariusvniekerk Totally understood this. That's why I think the mechanism of reading dependencies from a file could be an alternate option to the suggested use of selectors. But maybe this is the same as your first comment in this thread?

wagnerpeer avatar Jul 22 '21 19:07 wagnerpeer