conda-forge.github.io
conda-forge.github.io copied to clipboard
Move CI setup to pixi
Currently, when the CI starts, it does a number of things:
mamba install --update-specs --yes --quiet --channel conda-forge --strict-channel-priority \
pip mamba rattler-build conda-forge-ci-setup=4 "conda-build>=24.1"
mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority \
pip mamba rattler-build conda-forge-ci-setup=4 "conda-build>=24.1"
On macOS / Windows it would start with setting up a Miniforge, and then installing the same:
macOS script
MINIFORGE_URL="https://github.com/conda-forge/miniforge/releases/latest/download"
MINIFORGE_FILE="Mambaforge-MacOSX-$(uname -m).sh"
curl -L -O "${MINIFORGE_URL}/${MINIFORGE_FILE}"
rm -rf ${MINIFORGE_HOME}
bash $MINIFORGE_FILE -b -p ${MINIFORGE_HOME}
( endgroup "Installing a fresh version of Miniforge" ) 2> /dev/null
( startgroup "Configuring conda" ) 2> /dev/null
source ${MINIFORGE_HOME}/etc/profile.d/conda.sh
conda activate base
export CONDA_SOLVER="libmamba"
export CONDA_LIBMAMBA_SOLVER_NO_CHANNELS_FROM_INSTALLED=1
mamba install --update-specs --quiet --yes --channel conda-forge --strict-channel-priority \
pip mamba conda-build boa conda-forge-ci-setup=4
mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority \
pip mamba conda-build boa conda-forge-ci-setup=4
Windows script
:: Activate the base conda environment
call activate base
:: Configure the solver
set "CONDA_SOLVER=libmamba"
if !errorlevel! neq 0 exit /b !errorlevel!
set "CONDA_LIBMAMBA_SOLVER_NO_CHANNELS_FROM_INSTALLED=1"
:: Provision the necessary dependencies to build the recipe later
echo Installing dependencies
mamba.exe install "python=3.10" pip mamba conda-build boa conda-forge-ci-setup=4 -c conda-forge --strict-channel-priority --yes
if !errorlevel! neq 0 exit /b !errorlevel!
This could all be done in one swift step with pixi. Pixi is a single binary, that can be dropped anywhere, and that can either resolve + install, or use a lockfile for even faster & more controlled installation.
We could create / maintain the pixi.toml + lockfile externally to the feedstocks.
Pixi + rattler-build have the added benefit that they share the cache (repodata & packages) but that is a minor concern.
We would need to accommodate for remote_ci_setup, which allows to modify the base environment. So conda-smithy would need to encode something like this:
{% for pkg in remote_ci_setup %}
pixi add pkg
{% endfor %}
pixi install
An argument for keeping Miniforge around is that we were installing a conda+Python distribution anyway so why not just use that, but that might not be as relevant these days.
conda-smithy has conda_install_tool to control for these things for a while now, so you could even start the PR and let people opt in by changing that in conda-forge.yml.
micromamba is a single binary too, would be a smaller perturbation on the existing code, and would likely get us most of the extra efficiencies here.
Note that any of these changes will effect all folks who run build-locally.py as well, which is an important consideration since the code runs not just in CI but on people's machines.
I don't think this is a tool question but rather the approach to installing versions. In the end, the choice also will lead to the different tools as each is better tailored for both approaches. I think our choice is
- We want to use the latest available versions (and use *mamba).
- We want to use the exact pinned versions (and use pixi).
Pinning brings in speed and reproducibility at the cost of maintaining lockfiles, especially in the case where you have remote_ci_setup. Personally, I would see the overhead as feasible as updating the lockfiles could be handled as part of a conda-smithy rerender.
micromamba also handles lock files, but preserves the more traditional conda env workflow many people are used to.
An alternative could also be to create an installer that already contains the base installation and use that instead of Miniforge. This is closer to our current approach and also includes some locking.
An alternative could also be to create an installer that already contains the base installation and use that instead of Miniforge.
I've been thinking about this for a bit because right now Miniforge release cycle is coupled to the operational status of ALL feedstocks. In the past we've been blocked by boa not being compatible with the latest conda, etc.
That said, at that point we can also put that effort to switch to a single-binary provider, be it Pixi or micromamba.
We just need to write down which packages are needed in an environment.yml or a pixi.toml, then run that to provision the "build tool environment". Lockfiles are a separate conversation, in a way.
The "traditional env management" point I don't get, though. We are only creating a new environment on each CI run, which happens to be base in the Mini(conda|forge) world. Even with build-locally.py it simply calls the build scripts, which means delegating to a Docker image, or even a fresh Miniforge installation on macOS. So we "only" need to point the install tool to the installation location we have used so far.
My comment on the traditional env management is in a way related to how we as developers want to maintain smithy and what our mental model for it is precisely. This interacts with users when they want to build locally and also debug builds using conda debug IIUIC.
From our core call today, the two items we seem to be tackling here are:
- Reducing the overheads of provisioning the build tool (conda-build, rattler-build) and the related tools (ci_setup, etc). For this we will need some timings first (I'll post a table soon). The ideas are:
- Avoiding Miniforge install times and using a single binary instead (micromamba, pixi). However, micromamba and pixi would need to download packages already included in Miniforge.
- OR Creating an installer just to provision things
- On top of that, we can revisit the notion of using lockfiles for feedstocks, while still allowing some flexibility or integrating it in the rerender process.
- A task orchestration tool for feedstocks so it could replace
build-locally.pyas well as providing some convenient shortcuts for common operations like linting or rerendering.
On staged-recipes, I took the logs from https://github.com/conda-forge/staged-recipes/pull/27748
- Linux: 52s = 26s (pull Miniforge image) + 26s (just install deps)
2024-10-02T15:42:53.8611837Z + docker pull quay.io/condaforge/linux-anvil-cos7-x86_64
2024-10-02T15:43:19.2033102Z + docker run -v /home/vsts/work/1/s:/home/conda/staged-recipes -e HOST_USER_ID=1001 -e AZURE=True -e CONFIG -e CI -e CPU_COUNT -e DEFAULT_LINUX_VERSION quay.io/condaforge/linux-anvil-cos7-x86_64 bash /home/conda/staged-recipes/.scripts/build_steps.sh
2024-10-02T15:43:20.2050451Z + conda install --quiet --file /home/conda/staged-recipes/.ci_support/requirements.txt
2024-10-02T15:43:46.3255091Z + setup_conda_rc /home/conda/staged-recipes /home/conda/staged-recipes-copy/recipes /home/conda/staged-recipes-copy/.ci_support/linux64.yaml
- macOS: 1m30s = 3 seconds (Download Miniforge) + 20s (Install Miniforge) + 67s (install deps)
2024-10-02T15:43:06.2974070Z + curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh
2024-10-02T15:43:09.8784060Z + bash Miniforge3-MacOSX-x86_64.sh -bp /Users/runner/Miniforge3
2024-10-02T15:43:28.3250310Z + /Users/runner/Miniforge3/bin/conda install --quiet --file .ci_support/requirements.txt
2024-10-02T15:44:35.9780640Z + setup_conda_rc ./ ./recipes ./.ci_support/osx64.yaml
- Windows: 3min 5s = 3 seconds (Download Miniforge) + 60s (Install Miniforge) + 122s (install deps)
2024-10-02T15:44:08.7940784Z Installing dependencies
2024-10-02T15:46:14.4882501Z Setting up configuration
On a feedstock, I took the logs from libignition-physics (Unix) and mamba (Windows):
- Linux: 59s = 29s (pull Docker image) + 30s (install and update deps)
2024-10-02T19:25:18.4151535Z + docker pull quay.io/condaforge/linux-anvil-cos7-x86_64
2024-10-02T19:25:47.8172848Z + docker run -v /home/vsts/work/1/s/recipe:/home/conda/recipe_root:rw,z,delegated -v /home/vsts/work/1/s:/home/conda/feedstock_root:rw,z,delegated -e CONFIG -e HOST_USER_ID -e UPLOAD_PACKAGES -e IS_PR_BUILD -e GIT_BRANCH -e UPLOAD_ON_BRANCH -e CI -e FEEDSTOCK_NAME -e CPU_COUNT -e BUILD_WITH_CONDA_DEBUG -e BUILD_OUTPUT_ID -e flow_run_id -e remote_url -e sha -e BINSTAR_TOKEN -e FEEDSTOCK_TOKEN -e STAGING_BINSTAR_TOKEN quay.io/condaforge/linux-anvil-cos7-x86_64 bash /home/conda/feedstock_root/.scripts/build_steps.sh
2024-10-02T19:25:48.6216131Z + mamba install --update-specs --yes --quiet --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:18.8514019Z + mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:27.3112445Z + setup_conda_rc /home/conda/feedstock_root /home/conda/recipe_root /home/conda/feedstock_root/.ci_support/linux_64_.yaml
- macOS: 1m3s = 2s (Download Miniforge) + 14s (install Miniforge) + 47s (install + update deps)
2024-10-02T19:25:17.0039930Z + curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh
2024-10-02T19:25:19.3452300Z + bash Miniforge3-MacOSX-x86_64.sh -b -p /Users/runner/miniforge3
2024-10-02T19:25:33.0164800Z + mamba install --update-specs --quiet --yes --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:14.3050000Z + mamba update --update-specs --yes --quiet --channel conda-forge --strict-channel-priority pip mamba conda-build conda-forge-ci-setup=4 'conda-build>=24.1'
2024-10-02T19:26:20.2779970Z + setup_conda_rc ./ ./recipe ./.ci_support/osx_64_.yaml
- Windows: 4m22s = 7s (Download Miniforge) + 1m30s (install Miniforge) + 2m45s (install deps)
2024-10-02T18:52:46.6814736Z Installing dependencies
2024-10-02T18:55:00.1965077Z Setting up configuration
Some timings for a micromamba-only replacement on macOS and Windows: https://github.com/conda-forge/staged-recipes/pull/27753. Not much of a difference on macOS, but it's much faster on Windows!
I'd expect windows to be much faster because of the parallel download and extraction. We would get the same benefit with mamba 2.0.
For Pixi, both macOS and Windows take under 30s from scratch:
https://github.com/conda-forge/staged-recipes/pull/27754#issuecomment-2389788447
When you said micromamba was about the same for osx, what was the number? I'm curious.
On staged-recipes macOS, the Miniforge approach takes 1m10s to 1m30s. With micromamba, that goes down to 1min; with Pixi, 30s.
The differences with Windows are striking: from ~4mins to under a minute (micromamba) or even <30s (pixi).
We would get the same benefit with mamba 2.0.
On Windows, you get <1m with micromamba v1, <2m with micromamba v2, <30s with Pixi.
We would get the same benefit with mamba 2.0.
@isuruf unfortunately, no. With pixi / rattler the linking is done in a completely parallelized / pipelined way using async Rust (e.g. the whole download -> extraction -> linking per package is done in one go). Clobber issues are resolved after the transaction has executed (vs. ordered installation as in mamba / conda).
So it's not yet possible to reach the same speeds with mamba / conda.
We could build something bespoke with py-rattler though that would reach the same speeds.
Thanks @wolfv @jaimergp and nice work all around!
It appears that micromamba would be an easy win now and we could basically drop it in. We'd need to ensure it uses the same cache as conda/mamba in the docker container.
It also appears we should either move to pixi or a micromamba-like tool built on the same components.
@wolfv Is it possible to use pixi to create an env that has a name (and isn't a global env)? That'd make it easier to work with inside of smithy now I think, though I don't think this is a blocker. The osx and Linux builds share the same env management commands and we mount the feedstock dir and recipe dir separately which could make deciding on a directory for the env a bit tricky.
@beckermr Here is some information on your question for pixi:
You can activated an environment with run, shell or shell-hook. These are not possible to be named (yet) but you can specify a --manifest-path which is just a bit more verbose but doesn't require to be in a directory.
After activation all pixi commands will act like they are in that project. So you can do:
> pixi shell --manifest-path /path/to/pixi.toml
(env) > pixi run your_command
To activate similarly to conda activate you could use eval "$(pixi shell-hook --manifest-path /path/to/pixi.toml)".
On Windows, you get <1m with micromamba v1, <2m with micromamba v2, <30s with Pixi.
This is suspicious. micromamba v2 takes twice that of micromamba v1 ?
@isuruf unfortunately, no.
I was talking about the difference between mamba/conda and micromamba. I don't understand why you are trying to talk about pixi/rattler-build.
micromamba v2 takes twice that of micromamba v1 ?
Might be related to simdjson parsers not being so optimized on Windows (e.g. https://github.com/simdjson/simdjson/issues/847). 1.x used libsolv parsers, IIRC. I think there's a flag for that, let me check 🤔 Edit: nope, didn't change much.
Turns out that part of the slowdown in Windows is due to installing to C drive. Changing to D cuts it in half. See https://github.com/conda-forge/conda-smithy/pull/2076#issuecomment-2391665013.
Let's summarize more or less what I've found out today (no lockfiles):
| Platform | Installer | Time to provision |
|---|---|---|
| Windows | Miniforge (C:) ^1 | ~3-4 minutes |
| Windows | Miniforge (D:) [^4] | ~1.5-2 minutes |
| Windows | Micromamba v1 (C:) [^2] | ~1min |
| Windows | Micromamba v1 (D:) [^2] | ~1min |
| Windows | Micromamba v2 (C:) [^2] | <2mins |
| Windows | Micromamba v2 (D:) [^2] | ~1min |
| Windows | Pixi (D:) [^3] | 26s |
| macOS | Miniforge ^1 | 1-1.5min |
| macOS | Micromamba [^2] | 50s |
| macOS | Pixi [^3] | 24s |
I think the key points we can enforce now without too much controversy is:
- Moving Windows builds to
D:for a 2x speed:- [ ] https://github.com/conda-forge/conda-smithy/pull/2076
- [ ] https://github.com/conda-forge/staged-recipes/pull/27767
- Consider using
micromambalater to save some more 30s here and there:- [ ] https://github.com/conda-forge/conda-smithy/pull/2075
- [ ] https://github.com/conda-forge/staged-recipes/pull/27753
Pixi would be awesome too for a sub-30s deploy, but it will require a bigger overhaul of how the infra is set up.
[^2]: https://github.com/conda-forge/staged-recipes/pull/27753, https://github.com/conda-forge/conda-smithy/pull/2075 [^3]: https://github.com/conda-forge/staged-recipes/pull/27754. Note we can't choose the target directory (yet?). [^4]: https://github.com/conda-forge/conda-smithy/pull/2076
Consider using micromamba later to save some more 30s here and there:
Does it actually work when building packages? I thought that since the caches aren't shared, this is only moving the cost to a later stage.
I think we can set CONDA_PKGS_DIRS accordingly in Windows. In macOS, we are using ~/.conda so that should be ok. The caches should be compatible, I hope.
This can be checked with some carefully tuned deps in meta.yaml and then check the logs in debug mode.
Is there a specific set of features you miss to move to pixi?
Note we can't choose the target directory (yet?).
You could detach environments from their folder with pixi: https://pixi.sh/latest/reference/pixi_configuration/#detached-environments. pixi config set --global detached-environments "/where/ever/you/require" Would that be enough? This can be local or global to the project or machine.
Well @ruben-arts it'd be nice to have a drop-in replacement for conda based on the rattler tools. Then transitions would be very easy for us.
Btw the latest version of pixi (0.32.1) (and py-rattler) should be a lot faster again. We landed some very significant solver improvements. :)
You could detach environments from their folder with
pixi: https://pixi.sh/latest/reference/pixi_configuration/#detached-environments.
That's only for .pixi/envs right? The current workflow assumes a single environment for conda-build, which gets installed to e.g. ~/Miniforge3. The idea would be to have pixi create .pixi/envs/default to ~/Miniforge3 but this is not possible right now, correct?
Otherwise I guess we'd need to craft something with py-rattler but then we are back to Python bootstrapping land and we'd need to use something like CI's Python + pip to provide our installer framework (instead of pixi).
- Moving Windows builds to
D:for a 2x speed
If you want to experiment with https://github.com/marketplace/actions/setup-dev-drive as well, you may be able to get even more speed increase (that's not my action, I don't think I actually know the creator of it, but I did help with the underlying functionality, and the action's implementation looks reasonable at a quick glance).
Basically, the Windows OS drive does a lot of processing on every file access that any other drive will (probably) not do, and a Dev Drive is even more optimised for this kind of use. Hopefully, one day Actions will use a Dev Drive by default, but I don't think they've enabled that yet.