conda-lock
conda-lock copied to clipboard
Pinning subset requirements
When developing, we will install much more dependencies at runtime. We still would like to pin the runtime and the development/CI dependencies to the exact same versions. This should result in two lockfiles where one is a subset of the other.
Given the following environment.yml
:
name: nyc-taxi-fare-prediction-deployment-example
channels:
- conda-forge
- nodefaults
dependencies:
- click
- jupyterlab # [dev]
and the following command: conda-lock -f environment.yml --subset "core|dev=0"
I would expect two files conda-linux-64.lock
and conda-linux-64-core.lock
where in the latter jupyterlab and its dependencies are omitted but all other packages are the same version.
Does this sound like a reasonable proposal? If so, I would start working on this.
so the way we can do this atm is by passing multiple files using the compound specifications
conda-lock -f base.yml -f specific.yml -p linux-64 --filename-format "specific-{platform}.lock"
in the general case the additional dependency can invalidate the entire solve though so not sure we can just have these be atomic
Just a small comment to say that such a feature would be awesome to have on the conda
and mamba
side also when creating an env. See https://github.com/conda/conda/issues/10398 for something a bit different.
I really like this idea of selectors to specify subset env.
Rather than merging the specifications successively, I'd expect that a more accurate approximation would be:
- First install all dependencies
- Export as
conda-linux-64.lock
- Uninstall dev dependencies
- Export as
conda-linux-64-core.lock
Is there any progress on this? Anything I could do to help?
The most pressing thing where help is needed is to find a way how to specify this in a way that is compatible/accepted by the tools in the ecosystem (conda
, mamba
, conda-lock
).
We don't need to "install" anything actually. The solver output of conda
/ mamba
for the larger set of dependencies should be sufficient to already pin the subsets. You get a tree and you can leave out all branches that don't end up in one of the remaining directly listed dependencies.
We don't need to "install" anything actually.
Right, I actually meant "install" in the sense of "ask the conda/mamba solver which packages would be required to install"
You get a tree and you can leave out all branches that don't end up in one of the remaining directly listed dependencies.
Is it really a tree? I assume nodes would be packages, but then what would be the unique parent function? In any case it looks like I'd need to invest some time into understanding the data structures.
@xhochy, did you ever start working on this? Any partial progress to share?
@xhochy, did you ever start working on this? Any partial progress to share?
No, the important starting point for me would be to have consensus on the specification. Implementing itself is (from my perspective) a less complicated issue.
By "specification" do you mean a way to indicate dev dependencies such as using # [dev]
as a label? Or do you mean something else?
By "specification" do you mean a way to indicate dev dependencies such as using # [dev] as a label?
Yes.
I don't understand exactly what you mean by --subset "core|dev=0"
, but I'd trust your judgement, and it sounds like you have something clever in mind. I really like your proposal, and so do 7 others according to the :+1:'s. That looks quite positive to me.
What sort of consensus are you looking for exactly? I wonder if we could somehow push the issue?
I don't understand exactly what you mean by --subset "core|dev=0"
This is the one thing I really don't like about my suggestion. It is hard to specify the subset on the CLI in an understandable manner. --subset "core|dev=0"
boils down to:
-
--subset
: In addition to a full lock, also pin for this sub selection of packages -
core
: Name that selectioncore
-
dev=0
: The subset is selected by setting the environment markerdev
tofalse
.
Also, I had a discussion with @mariusvniekerk outside this issue where he expressed a certain dislike regarding selectors. @mariusvniekerk Can you maybe expand that here a bit?
My general problem with selectors is that they are an abnormal way to add logic into yaml.
Additionally conda lock supports reading deps from a few more sources that exist outside of the conda ecosystem.
I'm perfectly fine with just sticking a few more keys into the yaml (and the pyproject.toml etc) and parsing it correctly.
Maybe something like
conda-lock:
package-subsets:
core: [foo]
As a means of expressing named package subsets.
@mariusvniekerk, to clarify, would your [foo]
be a list of packages, each of which must also occur in dependencies?
Yep those packages and obviously all their deps from the resulting graph
Additionally a specification like this allows us to rather easily support it entirely on the command-line as well
conda-lock --package-subset core=foo,bar --package-subset test=foo,bar,pytest
Yep those packages and obviously all their deps from the resulting graph
Sorry, I'm very confused by the above comment... I thought we're talking about the yaml input, which the human specifies. In Uwe's example: click,jupyterlab
. Then we want some subset like core=click
. But the human would never be expected to compute the dependency graph, right? I thought that's conda-lock's job. Or are you discussing the output format?
I was searching for this exact feature and read about the solution pipenv is using for installation: a second file containing the development dependencies.
Maybe there is a blind spot to my solution, but what about leveraging the mechanism of conda-lock to read package names from multiple input files. Using this same mechanism to read a second file with development dependencies, where the file can have any of the supported formats. The command line for my suggestion could look like:
conda-lock -f environment.yml -d dev-environment.yaml
-d
is the short form of --dev
.
While writing, I realize the similarity to the --dev-dependencies
option. So what about re-thinking its meaning?
If no value is provided with --dev-dependencies
, the dependencies are read from the input (status quo). However if given a list of strings, those shall be interpreted as list of development dependencies. If given a path to a file, the file is parsed for dependencies and those are used (like command line above).
Therefore, the following command would also be valid:
conda-lock -f environment.yml -d=jupyterlab,pytest
just chiming in here real quick -- (micro)mamba got a new feature that conda also has (afaik) which is --freeze-installed
.
that works with the SOLVER_LOCK
of libsolv.
Might be useful somehow for this case: https://github.com/mamba-org/mamba/pull/1048/files
@wagnerpeer conda-lock already supports reading from multiple files for its dependencies.
conda-lock -f environment.yml -f dev-environment.yaml
will already work.
The drawback there is that it is possible in this setup if you generate a lock for both the regular version and the version with additional packages to result in different solves for the shared packages due to changes in constraints imposed by the dev packages.
@wolfv it might be useful, main thing is we don't materialize any of the solves here and atm conda-lock still has to run on stock conda, but we could potentially require mamba for these features.
I think for solvability this has to work from the superset backwards instead of growing a dependency set.
@mariusvniekerk Totally understood this. That's why I think the mechanism of reading dependencies from a file could be an alternate option to the suggested use of selectors. But maybe this is the same as your first comment in this thread?