NumPy 2 bringup
NumPy is currently working on a NumPy 2.0 release, which is planned to come out later this year. Here are the current (draft) release notes. Also here's the upstream tracking issue ( https://github.com/numpy/numpy/issues/24300 ), and ecosystem compatibility tracker.
Some questions worth discussing:
- How do we handle tightening loose
numpypins in packages ( https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/issues/516 )? - Do we want to keep building for NumPy 1 & 2 at the same time?
- What timeline do we want to use for adding NumPy 2?
- What timeline do we want to use for dropping NumPy 1?
- How should this be coordinated around other major migrations (like Python 3.12 or a new Boost)?
- Numpy now builds for the "oldest-available-C-API" by default, i.e. we could build for our current 1.22 C-API using numpy 1.25; we need to adapt to this in any case, at the latest when our lower bound becomes 1.25 (see https://github.com/conda-forge/conda-forge-pinning-feedstock/issues/4816)?
- Anything else we should discuss?
Todos:
- [x] https://github.com/conda-forge/numpy-feedstock/pull/313
- [x] https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/pull/728
- [x] https://github.com/regro/cf-scripts/issues/2469
- [x] https://github.com/regro/cf-scripts/pull/2470
- [x] https://github.com/conda-forge/conda-smithy/issues/1911
- [x] https://github.com/conda-forge/staged-recipes/pull/26188
- [x] https://github.com/conda-forge/numpy-feedstock/pull/314
- [x] https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/5790
- [x] https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/5851
- [ ] https://github.com/conda-forge/conda-forge.github.io/issues/2156
cc @conda-forge/core
See also https://github.com/conda-forge/conda-forge-pinning-feedstock/issues/4816
Can we merge these two issues just to make it easier to track them?
The issue Axel raises seems like a subpoint of this issue (depending on what we decide). Namely do we want to opt-in to this newer/slimmer ABI and how does that fit into NumPy 2
Sure. IMO Axel issue is a subset of this one. I don't have strong opinions on which one to keep, or if you want to keep both, but I also don't want to get lost on two mega-threads :grimacing:
Added Axel's item to the list above
Handling the ABI is the key point here (that and current packages missing a <2). I updated the added item because the summary was not accurate.
Normally I'd say we do a dual migration (keep 1.x; add 2.0), but numpy has around 5000** dependents in conda-forge, so that would be a pretty substantial CI impact, especially if it takes a while to drop 1.x.
**
>mamba repoquery whoneeds numpy -c conda-forge -p linux-64 > tmp
># edit to remove header
>python
>>> q = open("tmp", "r").readlines()
>>> p = {x.strip().split(" ")[0] for x in q} - {""}
>>> len(p)
4898
Obviously not all of them are compiling against numpy, but still...
I updated the added item because the summary was not accurate.
Thanks Axel! 🙏
Anyone should feel free to update the issue as needed 🙂
Following up on our discussion earlier about improving the visibility of NPY_FEATURE_VERSION, started this NumPy PR ( https://github.com/numpy/numpy/pull/24861 ) to message how the value is set
Also include a note about one approach we might take to ensure that value is embedded in built binaries. Though maybe there are better approaches for that portion
There should now be a string baked into binaries built with NumPy to notate what kind of NumPy compatibility they have
https://github.com/numpy/numpy/pull/25948
It is worth noting that thanks to Axel and others we now have NumPy 2.0.0rc1 packages: https://github.com/conda-forge/numpy-feedstock/issues/311
Also ecosystem support of NumPy 2 is being tracked in this issue Ralf opened: https://github.com/numpy/numpy/issues/26191
We are now in a good spot to start testing building packages with NumPy 2
I discussed this with @rgommers recently and one important point that he brought up is the situation with pin_compatible, which we'll have to fix as part of any migration effort, probably with a piggyback migrator, since we'll need to rewrite the recipes.
In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible under run:. However this will be doubly incorrect in the new setup - for one as our NPY_FEATURE_VERSION (which forms the lower bound) will be lower than the one at build time, and second because the upper bound should be something like <2.{{ numpy.split(".")[1] | int + 3 }} (for a project that's free of deprecation warnings; anything else might be deprecated in 2.{N + 1} and removed after two releases in 2.{N + 3}).
In particular, since
numpyisn't separated into a library and run-time component, we don't have a run-export [...]
Of course, if there's appetite for a split into libnumpy (with a run-export) and numpy (the python bits), that might be worth a thought as well. But then even moreso, we'd need a piggyback.
Of course, if there's appetite for a split into
libnumpy(with a run-export) andnumpy(the python bits), that might be worth a thought as well.
That doesn't sound good to me as a custom conda-forge split. If we want anything like that, let's do this properly and create a numpy-headers package that's officially supported by NumPy and that can be used by anyone (unless they need a static library or numpy.f2py) with a build-time dependency on the NumPy C API. We actually discussed this in a NumPy community meeting, and it seems feasible.
In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible under run:
We do have a run_export on numpy.
Yeah... Clearly I shouldn't be writing these comments from a phone 😅
I misremembered that part, but in that case the task becomes easier - we just set up the right run export in numpy itself, and then remove pin_compatible in feedstocks that compile against numpy. Right?
Another question we have to answer soon: what mechanism do we want to use for setting NPY_FEATURE_VERSION... Perhaps the easiest would be an activation script in numpy, but that's a fairly big hammer, as it persists beyond build time and into all user environments.
Right now I'm thinking of setting NPY_FEATURE_VERSION in the global pinning (cleanly overrideable per feedstock where necessary), and then using that in conda-forge-ci-setup to populate the environment variable that numpy will pick up (and if necessary, in the compiler activation feedstocks, e.g. for CFLAGS).
The only issue there is that the run-export on numpy is not dynamic, in the sense that it gets fixed to the value of NPY_FEATURE_VERSION at the build time of numpy, and not the (potentially different) one in play when building something else against numpy. Unless I'm overlooking something, we'd therefore need to transform rather than remove the existing uses of pin_compatible("numpy") with something like
- {{ pin_compatible("numpy", lower_bound=NPY_FEATURE_VERSION) }}
while the upper bound (<2.{N + 3}) would be set by the run-export on numpy.
What if we had something like...?
{% set version = "2.0.0" %}
package:
name: numpy
version: {{ version }}
...
build:
...
run_exports:
- {{ pin_subpackage("numpy", lower_bound=os.environ.get("NPY_FEATURE_VERSION", version)) }}
That way we can defer this environment variable setting to packages
If they don't set something, we can provide a sensible default (either version or something else we decide)
We could also consider whether conda-build could allow NPY_FEATURE_VERSION to be a pass through environment variable or if we handle that within conda-forge with some recipe changes to pass it through ourselves. This would let us set us use a global setting (as you suggest)
I don't think this type of NPY_FEATURE_VERSION setting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for. So all produced extension modules will work with all possible numpy versions that can actually be installed.
Hence, doing nothing should be the right default here, trying to change it from NumPy's default will likely only be the cause of extra complexity/confusion, and perhaps bugs.
That way we can defer this environment variable setting to packages
I'd be surprised if it works like that. AFAIU, that os.environ call will be resolved while building numpy.
I don't think this type of
NPY_FEATURE_VERSIONsetting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for.
Leaving aside NEP29, this is a quantity we have to be able to control IMO. Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks). I don't see how we can reasonably avoid making NPY_FEATURE_VERSION explicit in conda-forge in some way.
I very well could be wrong. It is easy to test
Really we just need more ideas to sample from. It's more important that we have a large variety before selecting one. So feel free to propose more
Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks).
It won't be wrong. The metadata that is in the upstream releases (i.e. the dependencies key in pyproject.toml files) is going to be updated by package authors, and that's the thing that should be relied on by conda-forge builds. The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.
I'm not sure I follow. Say a project has numpy >=1.24,<2.3 in its pyproject.toml, is there some sort of hook that populates NPY_FEATURE_VERSION to 1.24? If so, how would that constraint arrive in the metadata of the packages we build?
Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.
Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly (unless they really need to), or extremely slowly (i.e. going back to 1.19 adds about 2 years on top on top of what NEP29 foresees w.r.t. being able to use a given ABI feature).
The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.
In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).
Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.
This. And it's not doubtful, it is guaranteed to work. The whole point is to take away build-time version as a thing that dynamically overrides the declares runtime dependency range.
Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly
No, that does not make sense. If a package has numpy>=x.y in its constraints, you cannot just ignore that. The package author bumped the lower version for some reason, so if you tweak the metadata to say numpy>=x.y-N instead, you will allow a broken combination of packages.
In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).
No, and no. The lower bound is whatever dependencies=numpy... in pyproject.toml says, or it's a bug in the package (even if the package internally sets NPY_FEATURE_VERSION, which should be quite rare).
What the conda-forge tooling should do is check that the meta.yaml and pyproject.toml metadata is consistent - and I think that is a feature already present for regular Python packages. I.e., start treating numpy like any other Python package when building against numpy 2.x.
No, that does not make sense.
You chopped off the part of my quote that accounts for the scenario you describe.
The lower bound is whatever
dependencies=numpy...inpyproject.tomlsays
I'm not saying we should disregard runtime constraints. I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.
What the conda-forge tooling should do is check that the
environment.ymlandpyproject.tomlmetadata is consistent
I don't see this happening soon enough to be available for the 2.0 transition, it would need work on conda-build AFAICT.
and I think that is a feature already present for regular Python packages. I.e., start treating
numpylike any other Python package when building against numpy 2.x.
I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages? Anything else that has a run-export (to my knowledge) uses the build-time version as a lower bound. That's precisely the issue that requires attention here, because of the very unusual situation how building against numpy 2.0 produces something compatible with >=1.x.
I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.
What I am trying to explain is that that stacking is not doing anything, because
numpywill never set the feature version in a way that allows for this to have any effect, and- if a package does this internally by setting
NPY_FEATURE_VERSIONto something higher than what it says in itspyproject.toml, that's a bug in the package and should be fixed there by fixing itsdependencies=metadata.
I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages?
Ah, I did mean this since I remember dependencies being flagged on PRs - but it may not be ready indeed, since it's marked as experimental:
So it's still mostly manual then, depending on the feedstock maintainers to keep pyproject.toml and meta.yaml in sync?
For clarification, should environment.yml here be the recipe's meta.yaml? Or do you mean something else Ralf?
There are different levels of bot inspection or automation. However this is opt-in at this point. It is seeing some use in conda-forge, but we are probably not at the point where we could turn this on by default. Though that's a separate discussion I think
For clarification, should
environment.ymlhere be the recipe'smeta.yaml? Or do you mean something else Ralf?
Yes indeed. General tiredness 🤦🏼. Editing my comment to say meta.yaml to avoid further confusion.
Thanks Ralf! 🙏 All good. Appreciate hearing your insights and having your support here 🙂
Can imagine there are a lot of spinning plates with this work 😅
What I am trying to explain is that that stacking is not doing anything, because
numpywill never set the feature version in a way that allows for this to have any effect, and
I'm still not sure we're speaking the same language here. We'll have numpy 2.0 in the host environment, and we need to create a lower bound for the numpy run-export, i.e. numpy >=1.x. Where should this x come from?
We really need to have correct metadata, because the solver will flee into the past if we don't close off incorrect avenues. For example, we're still building for py38, and have numpy 1.18 builds for that. Anything built against numpy 2.0 (assuming I undestood correctly that the default ABI level is going to be 1.19) needs to be impossible to install with 1.18, hence needs the right constraints.
You're saying (effectively) that people's numpy dependencies= in their pyproject.toml are always going to be higher than the default NPY_FEATURE_VERSION, and that's an assumption that I don't think will hold. As the first (I swear...) random example I looked at, cvxpy HEAD uses numpy >=1.15. So we need to deal with this.
And once we deal with this, it's IMO not a good idea to just define a fixed run-export in numpy itself, because packages will want to override this - for example, if they need an ABI feature that's newer than the default of 1.19 for whatever reason. Hence why I'm tending towards
- {{ pin_compatible("numpy", lower_bound=NPY_FEATURE_VERSION) }}
per feedstock, with NPY_FEATURE_VERSION part of the global pinning (and overrideable per feedstock) instead of numpy.
Anything built against numpy 2.0 (assuming I undestood correctly that the default ABI level is going to be 1.19) needs to be impossible to install with 1.18, hence needs the right constraints.
It is impossible. The lowest Python version supported by NumPy 2.0 is 3.9. There is no numpy package for 1.18 on either conda-forge or on PyPI:
$ mamba search numpy=1.18.5
Loading channels: done
# Name Version Build Channel
numpy 1.18.5 py36h7314795_0 conda-forge
numpy 1.18.5 py36he0f5f23_0 conda-forge
numpy 1.18.5 py37h8960a57_0 conda-forge
numpy 1.18.5 py38h8854b6b_0 conda-forge
So yes, it is by design of how numpy sets the default targeted C API impossible to get an incompatible combination here, even in an example like cvxpy where they set their lower bound to 1.15 (that could well be valid if they still support Python 3.7, no idea).
For example, we're still building for py38
There cannot be a numpy 2.0 package for py38, so this isn't relevant.
So we need to deal with this.
I'm still convinced that this is not true - it does not need dealing with explicitly in conda-forge recipes because it cannot go wrong. It's perfectly okay for the conda-forge cvxpy version to have >=1.15 in its metadata, that will work for any actual numpy package built by conda-forge.