docker-stacks icon indicating copy to clipboard operation
docker-stacks copied to clipboard

Support a workflow for extending the base environment in a reproducible fashion

Open ctcjab opened this issue 8 months ago • 6 comments

What docker image(s) is this feature applicable to?

minimal-notebook

What change(s) are you proposing?

Currently, the base conda environment is not extensible in a reproducible fashion. Users who need to install additional packages in it have no good workflow for e.g. maintaining a conda-lock lockfile for a custom base env that has as one input file the packages that their docker-stacks base image installs in it, and has another input file containing their additional customizations.

How does this affect the user?

Users have to suffer with longer Docker build times due to not being able to bypass the solver (shout out to https://github.com/conda/conda/issues/8372 defeating https://conda.github.io/conda-lock/pip/ and https://conda.github.io/conda-lock/output/#explicit-lockfile) and builds that may create a non-reproducible base env.

Anything else?

Thanks for maintaining docker-stacks!

ctcjab avatar Apr 25 '25 17:04 ctcjab

Could you please elaborate on the issue?

Why can't you use the combination of 2:

  • commit hash/date tag of the docker image - this will give you the exact same image each time you use it
  • conda-lock to also include things you installed on top

If you want to build your own custom base environment, also pin it using some commit hash/date. I've just checked and conda-lock v3 has been recently released and mamba v2 support was added there, so it should work well with our images: https://github.com/conda/conda-lock/releases/tag/v3.0.0

mathbunnyru avatar Apr 26 '25 23:04 mathbunnyru

conda-lock can take multiple environment.yml files as input, so you can export the base env by running conda env export > base-env.yaml in your base image, and then create a locked env with:

conda-lock lock --platform $conda_platform -f your-environment.yaml -f base-env.yaml --kind env

which you can then install in your image with

FROM $BASE_IMAGE
COPY conda-*.lock.yml /tmp/
RUN mamba env update -n base -f /tmp/conda-linux-aarch64.lock.yml # or whatever arch

Because it's an environment.yml, that means it does run the solver on all the packages, so you don't get the nice performance benefit of a from-scratch --explicit install, but it does mean it will appropriately only install packages not already in the base env.

A full working sample is here.

minrk avatar Apr 28 '25 07:04 minrk

Thanks, I'm already using a similar workflow to work around this, but bypassing the solver at build time is the main thing I'm after here (see the first sentence in the issue description under "How does this affect the user?").

My suggestion would be:

  1. This project should use conda-lock to maintain lockfiles for the base environment in the provided images, thereby also benefiting from bypassing the solver at build time
  2. Along with the images that are published as release artifacts, also publish the associated conda-lock input files, so that consumers could include those along with their own additional input files to generate a lockfile for their extended base environment

ctcjab avatar Apr 29 '25 17:04 ctcjab

I’m not aware of a tool that can update an env without a solve and without reinstalling the env. Adding a sync/update is a good feature request for conda-lock, but I don’t think it exists (https://github.com/conda/conda-lock/issues/751), so conda-lock alone doesn’t solve that problem and doesn’t work for images that inherit from each other like these stacks.

Maybe pixi could do it? I’m not sure.

minrk avatar Apr 29 '25 18:04 minrk

Interesting. Were you thinking of migrating these images to use pixi rather than conda for other reasons? Didn't see an issue for that yet, and would be interested to at least follow a discussion about pros and cons.

ctcjab avatar Apr 29 '25 19:04 ctcjab

Not particularly, I was only noting that I am not aware of a tool that solves the problem you described (specifically lock plus no build-time solve for layered installs). Pixi is just the only one I know of that might. I haven’t proposed moving to it, or investigated in sufficient detail how it might.

minrk avatar Apr 29 '25 21:04 minrk