pixi icon indicating copy to clipboard operation
pixi copied to clipboard

Installing different variants does not trigger a rebuild

Open leofang opened this issue 4 months ago • 14 comments

Checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pixi, using pixi --version.

Reproducible example

Reproducer:

$ pixi --version
pixi 0.49.0
$ git clone --recursive https://github.com/cupy/cupy.git
$ cd cupy
$ # add the pixi.toml below to the root
$ pixi install -e cu129 -vv
$ pixi install -e cu128 -vv

This is my pixi.toml

# Specifies properties for the whole workspace
[workspace]
channels = ["conda-forge"]
platforms = ["linux-64", "linux-aarch64", "win-64"]
preview = ["pixi-build"]  # the feature is in preview

[workspace.build-variants]
cuda-version = ["12.9.*", "12.8.*"]

[feature.cu129.dependencies]
cuda-version = "12.9.*"

[feature.cu128.dependencies]
cuda-version = "12.8.*"

[environments]
cu129 = ["cu129"]
cu128 = ["cu128"]

# There can be multiple packages in a workspace
# In `package` you specify properties specific to the package
[package]
name = "cupy"
version = "14.0.0a0"

# Here the build system of the package is specified
# We are using `pixi-build-python` in order to build a Python package
[package.build]
backend = { name = "pixi-build-python", version = "*" }
channels = [
  "https://prefix.dev/pixi-build-backends",
  "https://prefix.dev/conda-forge",
]

[package.build.configuration]
env = { NVCC = "$PREFIX/bin/nvcc", CUDA_PATH = "$PREFIX/targets/x86_64-linux/", CUPY_NVCC_GENERATE_CODE = "arch=compute_86,code=sm_86;arch=compute_89,code=sm_89", CUPY_NUM_BUILD_JOBS = "$(nproc)" }
noarch = false  # Build platform-specific package

[package.host-dependencies]
cuda-version = "*"
libcublas-dev = "*" 
libcusolver-dev = "*" 
libcusparse-dev = "*"
libcufft-dev = "*" 
libcurand-dev = "*"
cuda-nvcc = "*"
cuda-nvrtc-dev = "*"
cuda-nvtx-dev = "*"
cuda-profiler-api = "*"
fastrlock = ">=0.5"
setuptools = ">=77"
cython = ">=3.0,<3.2"
python = ">=3.9"

# We add our package as dependency to the workspace
# If the directory contains a `pixi.toml`, `pixi-build` will be used to build the package
[dependencies]
cupy = { path = "." }
python = ">=3.9"
numpy = "*"
fastrlock = ">=0.5"

which more or less follows what we have in cupy-feedstock.

Issue description

Hey team, I made a pixi.toml for building CuPy from source (with CuPy developers in mind) and it’s blazingly fast, thank you very much for the awesome work!!!

I have one observation on the caching behavior for variants. I wanted to have variants cu120, cu121, …, cu129 so that I can build/test against any CUDA 12.x version. However I find if I run the above reproducer, the 2nd time onward there won’t be any rebuild (so installing cu128 does not re-compile at all). Am I doing something wrong?

Expected behavior

Each variant should be built once and then cached; installing an uncached variant should trigger a build.

leofang avatar Jul 14 '25 21:07 leofang

this may have been improved already since 0.49.0 — I am looking at specifically https://github.com/prefix-dev/pixi/pull/4094/files#diff-e982128a03866a637867eb36b7ebe81c1a79af9df636bb5d4cc5c4b9efb17282R702-R704. @baszalmstra do you reckon this is covered?

lucascolley avatar Jul 14 '25 21:07 lucascolley

What happens is that two variants are generated for cupy but both environments select the same variant. This is because nothing in the run dependencies of cupy pin the cuda version.

Although a variant is applied to cuda-version it is only applied in the host dependencies. And because cuda-version has no run exports this does not propagate to the run dependencies.

To fix this, add cuda-version to the run dependencies as well.

[package.run-dependencies]
cuda-version = "*"

I think that will solve it.

baszalmstra avatar Jul 15 '25 04:07 baszalmstra

Thanks, Bas! I gave it a shot but it still did not rebuild (no compilation happens after this heading was printed):

 │ │   ************************************************************
 │ │   * CuPy Configuration Summary                               *
 │ │   ************************************************************

Here's the diff I applied:

diff --git a/old.toml b/pixi.toml
index 2ca6cf274..18f4b0dcc 100644
--- a/old.toml
+++ b/pixi.toml
@@ -52,10 +52,13 @@ setuptools = ">=77"
 cython = ">=3.0,<3.2"
 python = ">=3.9"
 
+[package.run-dependencies]
+python = ">=3.9"
+numpy = "*"
+fastrlock = ">=0.5"
+cuda-version = "*"
+
 # We add our package as dependency to the workspace
 # If the directory contains a `pixi.toml`, `pixi-build` will be used to build the package
 [dependencies]
 cupy = { path = "." }
-python = ">=3.9"
-numpy = "*"
-fastrlock = ">=0.5"

Any chance you can spot what else could be off?

leofang avatar Jul 15 '25 20:07 leofang

Its a bit hard for me to test locally because the build fails with:

 │ │   Finished generating code
 │ │   **************************************************
 │ │   *** WARNING: Cannot check compute capability
 │ │   Cannot execute a stub file.
 │ │   Original error: Command 'C:\Users\zalms\AppData\Local\Temp\tmpwhdjbjx4\a' returned non-zero exit status 1.
 │ │   **************************************************

But let me first clarify what you mean with:

Thanks, Bas! I gave it a shot but it still did not rebuild (no compilation happens after this heading was printed)

So what you are seeing is that the second install command (with -e cu128) does start a recompilation e.g. its not the same as running the first command (with -e cu129) twice, but the incremental compilation itself doesn't seem to rebuild any files?

baszalmstra avatar Jul 15 '25 21:07 baszalmstra

Its a bit hard for me to test locally because the build fails with:

 │ │   Finished generating code
 │ │   **************************************************
 │ │   *** WARNING: Cannot check compute capability
 │ │   Cannot execute a stub file.
 │ │   Original error: Command 'C:\Users\zalms\AppData\Local\Temp\tmpwhdjbjx4\a' returned non-zero exit status 1.
 │ │   **************************************************

But let me first clarify what you mean with:

Thanks, Bas! I gave it a shot but it still did not rebuild (no compilation happens after this heading was printed)

So what you are seeing is that the second install command (with -e cu128) does start a recompilation e.g. its not the same as running the first command (with -e cu129) twice, but the incremental compilation itself doesn't seem to rebuild any files?

ping @leofang

lucascolley avatar Sep 05 '25 13:09 lucascolley

Sorry Bas/Lucas, I am a bit swamped and will try to get to this next week 🙇

cc @cpcloud for vis (who has looked into a similar situation on our side)

leofang avatar Sep 18 '25 01:09 leofang

I closed issue #4622 because this issue is better described here.

Here is a link to an example repo with the issue. I describe how to reproduce the issue in the README.md

Here was the issue

Issue 2: Build cache reused across Python versions

  • After building with py313, Pixi creates a .pixi/build artifact.
  • Switching to py312 does not trigger a rebuild, Pixi instead reuses the 3.13 build.

Reproducing steps:

# First run py313 environment
pixi run -e py313 python test.py

# Then try py312 environment
pixi run -e py312 python test.py

This will error because pointops_cuda was built with python 3.13. You can check because libs/pointops/functions/_C.cpython-313-x86_64-linux-gnu.so was generated. The issue is when calling the second environment run it is using the same build directory .pixi/build

Potential Fix?

Could a potential solution specifying the build directory to be inside the environment that is being used?

lllangWV avatar Sep 19 '25 12:09 lllangWV

Sorry for my late reply. @baszalmstra @lucascolley can we reopen this issue? I don't think this is fixed.

So what you are seeing is that the second install command (with -e cu128) does start a recompilation e.g. its not the same as running the first command (with -e cu129) twice, but the incremental compilation itself doesn't seem to rebuild any files?

Right, it was what I saw.

I see the recent PR #4665 and have updated to the latest pixi (0.59.0). Unfortunately I don't think it does the job. In particular, it's been confusing to me what build variants vs environments/features mean in pixi. I suspect I used the wrong terminology in the issue title?

What I actually want is to be able to create a cartesian product of envs (Py vers x CUDA vers) on demand:

$ pixi install -e py312-cu13 -vv

I image the manifest should look like this (@baszalmstra this version should avoid the build error that you encountered earlier; I verified it by cutting down to no single variant):

[workspace]
channels = ["conda-forge"]
platforms = ["linux-64", "linux-aarch64", "win-64"]
preview = ["pixi-build"]  # the feature is in preview

[workspace.build-variants]
python = ["3.10.*", "3.11.*", "3.12.*", "3.13.*", "3.14.*"]
cuda-version = ["12.*", "13.*"]

[feature.cu13.system-requirements]
cuda = "13"

[feature.cu13.dependencies]
cuda-version = "13.*"

[feature.cu12.system-requirements]
cuda = "12"

[feature.cu12.dependencies]
cuda-version = "12.*"

[feature.py314.dependencies]
python = "3.14.*"

[feature.py313.dependencies]
python = "3.13.*"

[feature.py312.dependencies]
python = "3.12.*"

[feature.py311.dependencies]
python = "3.11.*"

[feature.py310.dependencies]
python = "3.10.*"

[environments]
py314-cu13 = ["py314", "cu13"]
py314-cu12 = ["py314", "cu12"]
py313-cu13 = ["py313", "cu13"]
py313-cu12 = ["py313", "cu12"]
py312-cu13 = ["py312", "cu13"]
py312-cu12 = ["py312", "cu12"]
py311-cu13 = ["py311", "cu13"]
py311-cu12 = ["py311", "cu12"]
py310-cu13 = ["py310", "cu13"]
py310-cu12 = ["py310", "cu12"]

[package]
name = "cupy"
version = "14.0.0a0"

[package.build]
backend = { name = "pixi-build-python", version = "*" }

[package.build.config]
env = { CUPY_NVCC_GENERATE_CODE = "current", CUPY_NUM_BUILD_JOBS = "$(nproc)" }
noarch = false
# For some reason it has to be "cuda-nvcc" instead of "cuda" as documented by pixi-build
compilers = ["c", "cxx", "cuda-nvcc"]

[package.build.target.linux-64.config.env]
CUDA_PATH = "$PREFIX/targets/x86_64-linux"
NVCC = "$BUILD_PREFIX/bin/nvcc"

[package.build.target.linux-aarch64.config.env]
CUDA_PATH = "$PREFIX/targets/sbsa-linux"
NVCC = "$BUILD_PREFIX/bin/nvcc"

[package.build.target.win-64.config.env]
CUDA_PATH = '%PREFIX%\Library'
NVCC = "%BUILD_PREFIX%\\Library\\bin\\nvcc"

[package.build-dependencies]
# For some reason "cuda-nvcc" in config.compilers is not enough
cuda-compiler = "*"

[package.host-dependencies]
cuda-version = "*"
libcublas-dev = "*" 
libcusolver-dev = "*" 
libcusparse-dev = "*"
libcufft-dev = "*" 
libcurand-dev = "*"
cuda-cudart-static = "*"
cuda-nvrtc-dev = "*"
cuda-nvtx-dev = "*"
cuda-profiler-api = "*"
cutensor = "~=2.3"
setuptools = ">=77"
cython = ">=3.0,<3.2"
python = "*"

[package.target.linux.host-dependencies]
cuda-driver-dev = "*"
nccl = "~=2.16"

[package.run-dependencies]
python = "*"
cuda-version = "*"
numpy = ">=2.0"

[dependencies]
cupy = { path = "." }

[target.linux-64.activation.env]
CUDA_PATH = "$CONDA_PREFIX/targets/x86_64-linux"

[target.linux-aarch64.activation.env]
CUDA_PATH = "$CONDA_PREFIX/targets/sbsa-linux"

[target.win.activation.env]
CUDA_PATH = '%CONDA_PREFIX%\Library'

Right now this throws random env resolving errors (it's different each time). I suspect it has to do with the default env conflicting with the variant envs.

Error:   × failed to solve requirements of environment 'py313-cu12' for platform 'linux-64'
  ├─▶   × failed to extract metadata for package 'cupy'
  │   
  ├─▶   × while trying to solve the host environment for the package
  │   
  ├─▶   × failed to solve the environment
  │   
  ╰─▶ Cannot solve the request because of: The following packages are incompatible
      ├─ __cuda * can be installed with any of the following options:
      │  └─ __cuda 12
      └─ cuda-version 13.* cannot be installed because there are no viable options:
         └─ cuda-version 13.0 | 13.1 would constrain
            └─ __cuda >=13, which conflicts with any installable versions previously reported
     

leofang avatar Nov 11 '25 22:11 leofang

what happens if you set no-default-feature for the environments?

lucascolley avatar Nov 11 '25 22:11 lucascolley

I literally just tried :D Unfortunately, no difference.

[environments]
py314-cu13 = { features = ["py314", "cu13"], no-default-feature = true }
py314-cu12 = { features = ["py314", "cu12"], no-default-feature = true }
py313-cu13 = { features = ["py313", "cu13"], no-default-feature = true }
py313-cu12 = { features = ["py313", "cu12"], no-default-feature = true }
py312-cu13 = { features = ["py312", "cu13"], no-default-feature = true }
py312-cu12 = { features = ["py312", "cu12"], no-default-feature = true }
py311-cu13 = { features = ["py311", "cu13"], no-default-feature = true }
py311-cu12 = { features = ["py311", "cu12"], no-default-feature = true }
py310-cu13 = { features = ["py310", "cu13"], no-default-feature = true }
py310-cu12 = { features = ["py310", "cu12"], no-default-feature = true }

leofang avatar Nov 11 '25 22:11 leofang

Any chance I shouldn't use the package section? Maybe it is not fully compatible with environments/features?

leofang avatar Nov 11 '25 23:11 leofang

I shrunk the cartesian product to only two elements and it still fails:

[workspace.build-variants]
python = ["3.14.*", "3.14.*"]
cuda-version = [">=12.0,<13", ">=13.0,<14"]

[feature.py314-cu13.dependencies]
python = "3.14.*"
cuda-version = ">=13.0,<14"

[feature.py314-cu12.dependencies]
python = "3.14.*" 
cuda-version = ">=12.0,<13"

[environments]
py314-cu13 = { features = ["py314-cu12"] }
py314-cu12 = { features = ["py314-cu13"] }

The error message makes me think that the default environment has an incompatible assumption with the variants that confuses the dependency resolver:

Error:   × failed to solve requirements of environment 'py314-cu13' for platform 'linux-64'
  ├─▶   × failed to extract metadata for package 'cupy'
  │   
  ├─▶   × while trying to solve the host environment for the package
  │   
  ├─▶   × failed to solve the environment
  │   
  ╰─▶ Cannot solve the request because of: The following packages are incompatible
      ├─ cuda-version >=13.0,<14 cannot be installed because there are no viable options:
      │  ├─ cuda-version 13.1, which conflicts with the versions reported above.
      │  └─ cuda-version 13.0, which conflicts with the versions reported above.
      └─ cuda-version >=12.0,<13 cannot be installed because there are no viable options:
         ├─ cuda-version 12.9, which conflicts with the versions reported above.
         ├─ cuda-version 12.8, which conflicts with the versions reported above.
         ├─ cuda-version 12.6, which conflicts with the versions reported above.
         ├─ cuda-version 12.5, which conflicts with the versions reported above.
         ├─ cuda-version 12.4, which conflicts with the versions reported above.
         ├─ cuda-version 12.4, which conflicts with the versions reported above.
         ├─ cuda-version 12.3, which conflicts with the versions reported above.
         ├─ cuda-version 12.3, which conflicts with the versions reported above.
         ├─ cuda-version 12.2, which conflicts with the versions reported above.
         ├─ cuda-version 12.2, which conflicts with the versions reported above.
         ├─ cuda-version 12.1, which conflicts with the versions reported above.
         ├─ cuda-version 12.1, which conflicts with the versions reported above.
         ├─ cuda-version 12.1, which conflicts with the versions reported above.
         ├─ cuda-version 12.1, which conflicts with the versions reported above.
         ├─ cuda-version 12.0, which conflicts with the versions reported above.
         ├─ cuda-version 12.0, which conflicts with the versions reported above.
         ├─ cuda-version 12.0, which conflicts with the versions reported above.
         ├─ cuda-version 12.0, which conflicts with the versions reported above.
         └─ cuda-version 12.0.0, which conflicts with the versions reported above.

I mean, how is it possible that cuda-version 13.0 conflicts with cuda-version >=13.0,<14?

leofang avatar Nov 11 '25 23:11 leofang

On the other hand, if I break the cartesian product and let cuda-version float (so that it always uses CUDA 13), my manifest above works just fine:

[feature.py314.dependencies]
python = "3.14.*"

[feature.py313.dependencies]
python = "3.13.*"

[feature.py312.dependencies]
python = "3.12.*"

[feature.py311.dependencies]
python = "3.11.*"

[feature.py310.dependencies]
python = "3.10.*"

[environments]
py314 = ["py314"]
py313 = ["py313"]
py312 = ["py312"]
py311 = ["py311"]
py310 = ["py310"]

leofang avatar Nov 11 '25 23:11 leofang

@ruben-arts @wolfv I begin to think https://github.com/prefix-dev/pixi/issues/4303 is the same issue as what I am seeing above. Could you kindly take a look?

leofang avatar Nov 11 '25 23:11 leofang