conda-lock icon indicating copy to clipboard operation
conda-lock copied to clipboard

Feature request: Upsert (smarter conda-lock install)

Open jab opened this issue 1 year ago • 14 comments

Checklist

  • [x] I added a descriptive title
  • [x] I searched open requests and couldn't find a duplicate

What is the idea?

Given a conda environment name and a conda-lock.yml, I would like to be able to run something like “conda-lock upsert -n envname conda-lock.yml” which would first check if some existing environment with this name already matches the lockfile, if so does nothing, and if not it updates the env to match the lockfile.

Why is this needed?

Currently the best you can do to make sure you have a particular named env that matches a lockfile is delete any existing same-named env and then run a conda-lock install to create a new one to match the lockfile.

Besides wasting time in the case that an existing env already matches the lockfile, this is even more of a problem in the case that a needless conda-lock install fails due to a transient error, leaving the user without any working env till the transient error goes away. (This happened to my users recently due to intermittent spurious checksum mismatch errors.)

What should happen?

(Ideally this would create the new env in a tempdir and only move it on top of any existing env after installation of all packages succeeded, to be a bit more atomic. But just the ability to exit early in the case that an existing env already matches a lockfile would be a big win.)

Additional Context

No response

jab avatar Nov 14 '24 21:11 jab

Thanks @jab for the suggestion! This would be a great feature, but development time is pretty scarce here. I'd recommend either:

  • Look into pixi. It's a lot more polished, very actively being developed, and already does clever things regarding updating environments.
  • If you want to stick with conda-lock, is this a feature that you'd be interested to contribute?

maresb avatar Nov 14 '24 21:11 maresb

Thanks! This is for work where we’re already standardized on conda and switching to pixi for this use case is not something I can do independently or immediately, but it is certainly something I can look into more.

In the meantime, I’m pressed for time too, but is there a “quick and dirty” approach you could describe that I could maybe try to implement in a couple hours tomorrow? Let me know if there’s some cheap comparison possible of the “conda list…” output (possibly hashed into a single value?) that could be compared to a hash from the lockfile spec, I would be down to give that a shot.

jab avatar Nov 15 '24 00:11 jab

is there a “quick and dirty” approach you could describe that I could maybe try to implement in a couple hours tomorrow?

That's a really good question...

If you don't care about diffing the individual dependencies, that helps a lot. Probably your best bet would be to use conda list --json which produces output looking like

[
    {
        "base_url": "https://conda.anaconda.org/conda-forge",
        "build_number": 1,
        "build_string": "py312hef9b889_1",
        "channel": "conda-forge",
        "dist_name": "zstandard-0.23.0-py312hef9b889_1",
        "name": "zstandard",
        "platform": "linux-64",
        "version": "0.23.0"
    },
    {
        "base_url": "https://conda.anaconda.org/conda-forge",
        "build_number": 0,
        "build_string": "ha6fb4c9_0",
        "channel": "conda-forge",
        "dist_name": "zstd-1.5.6-ha6fb4c9_0",
        "name": "zstd",
        "platform": "linux-64",
        "version": "1.5.6"
    }
]

Note that this will ignore any pip dependencies in your environment. I hope this is okay?

You may need to compute the current platform (e.g. linux-64), and for that you can use this. (Oh, or quicker and dirtier, just parse it from the above output!!! 😂)

You should be able to parse the lockfile, extract the packages for the current platform, parse the URLs for the filenames, strip the .conda or .bz2 extension, and compare with dist_name from the JSON above.

I would recommend avoiding mamba list or micromamba list at this time because they recently released v2, and there are still a bunch of bugs with the JSON output. (I'm currently waiting for these issues to be resolved before releasing conda-lock v3.)

I hope that approach fits with your time constraints, and I hope I didn't forget anything important.

maresb avatar Nov 15 '24 08:11 maresb

Thank you for the suggestion! I have incorporated it into our conda-lock install-based automation and it has been working well. Please consider this a vote of confidence that this approach could work well if built into conda-lock itself.

ctcjab avatar Feb 12 '25 19:02 ctcjab

Great, I'm glad you were able to make this work!

While not directly related, the ideas from here may be helpful in #639. (Adding the cross-reference for discoverability.)

maresb avatar Feb 12 '25 20:02 maresb

Strange: I picked up an upgrade from libmamba 2.1.0 to 2.1.1 (released yesterday) and it broke this "environment matches lockfile?" logic.

From this diff, I think it may be due to conda-lock install installing packages from pypi rather than from a conda channel when libmamba 2.1.1 is installed rather than 2.1.0: Image

When I realized that keeping libmamba pinned to 2.1.0 fixes this, I did not investigate further, but wanted to at least post here in case it's cause for further investigation / potentially reporting a bug to libmamba.

ctcjab avatar May 06 '25 17:05 ctcjab

Yikes, thanks @ctcjab for the heads up.

It is indeed strange that it's using PyPI. I wonder if it's a regression.

maresb avatar May 06 '25 17:05 maresb

@ctcjab, are you still seeing this behavior with the latest versions of libmamba and conda-lock? If so, let's open a separate issue.

maresb avatar Jun 14 '25 15:06 maresb

Unfortunately yes. Just tested again with latest libmamba (currently 2.2.0) and same version of conda-lock (2.5.8), and reproduced the behavior where the "environment matches lockfile?" logic no longer works.

ctcjab avatar Jun 15 '25 20:06 ctcjab

Hi folks,

Although matching the lock file contents against the installed environment is a sound concept, I suspect its implementation would be complex. Also, the command might require considerable time to execute before confirming the environment and lock file are in sync and no action is needed.

What if we adopted a simpler approach: stamping an environment to indicate its installation originated from a specific lock file?

For instance:

  1. When conda-lock install some-lock.yml is used to create an environment initially, it would also generate a new file within that environment, which would serve to track the environment's origin from some-lock.yml. We could use a file in etc/conda/conda-lock-info.json within the environment:

    "conda-lock": {
        "lock-file": {
            "hash": "012312deadbeef"
        } 
    }
    
  2. Upon subsequent executions of conda-lock install some-lock.yml, conda-lock could check for this file and compare its recorded hash against hash of some-lock.yml. If they match, no action would be required. If the file is absent or the hashes differ, the current installation process would proceed.

While this method wouldn't detect discrepancies if a user directly modified the environment (e.g., by running conda install sqlalchemy without conda-lock), this seems like a beneficial trade-off given its simplicity and speed of implementation.

Moreover, this approach doesn't prevent a more comprehensive implementation from being developed later, where conda-lock could, in cases of hash mismatch, proceed to inspect the installed packages and compare them with the lock file's contents.

nicoddemus avatar Jun 16 '25 12:06 nicoddemus

Thanks for chiming in with the good idea, happy to see this getting more attention.

I thought about something like this too. I'm all for any kind of even more modest / incremental improvements that could make conda-lock install short-circuit more quickly for environments that already match the given lockfile, especially if it means conda-lock could ship something sooner and be less vulnerable to potential bugs in libmamba.

But I'd be concerned about this performance improvement coming at the expense of correctness:

this method wouldn't detect discrepancies if a user directly modified the environment

I.e., conda-lock install should never be allowed to succeed (quickly or otherwise) in the case that the named environment does not exactly match the given lockfile. At least not without passing some flag like --enable-incorrect-short-circuiting that users would have to opt into and that makes the caveats more obvious.

Also, the command might require considerable time to execute before confirming the environment and lock file are in sync and no action is needed.

FWIW, the "environment matches lockfile?" logic I implemented based on https://github.com/conda/conda-lock/issues/751#issuecomment-2478270012 takes only a few seconds to complete, including for large conda environments with 600+ dependencies in their transitive closures.

ctcjab avatar Jun 16 '25 13:06 ctcjab

At least not without passing some flag like --enable-incorrect-short-circuiting that users would have to opt into and that makes the caveats more obvious.

Fair enough, I agree.

FWIW, the "environment matches lockfile?" logic I implemented based on https://github.com/conda/conda-lock/issues/751#issuecomment-2478270012 takes only a few seconds to complete, including for large conda environments with 600+ dependencies in their transitive closures.

A few seconds seems reasonable to me too.

nicoddemus avatar Jun 16 '25 13:06 nicoddemus

Regarding the PyPI vs. conda-forge discrepancy I mentioned above (and repro'd here), thank you @maresb for suggesting the workaround of using mamba list rather than conda list -- that did the trick!

@ctcjab, are you still seeing this behavior with the latest versions of libmamba and conda-lock? If so, let's open a separate issue.

I'd be keen to make sure there's a dedicated issue for this and follow progress, just not sure where to create the issue and what to put in it. (Upgrading to libmamba>2.1.0 breaks conda list, but not mamba list?) If you have a good enough sense of what's going on to submit a better issue than I could, please do, and thanks so much again for figuring out this workaround!

ctcjab avatar Jun 23 '25 20:06 ctcjab

Now that I've confirmed that the issue is upstream, it would indeed be nice to clean up this issue by migrating the corresponding comments there. I need to look a little bit more deeply to figure out whether the issue is in conda or mamba. I'll try to get to that within a few days.

maresb avatar Jun 24 '25 08:06 maresb