conda-forge.github.io icon indicating copy to clipboard operation
conda-forge.github.io copied to clipboard

NumPy 2 bringup

Open jakirkham opened this issue 2 years ago • 66 comments

NumPy is currently working on a NumPy 2.0 release, which is planned to come out later this year. Here are the current (draft) release notes. Also here's the upstream tracking issue ( https://github.com/numpy/numpy/issues/24300 ), and ecosystem compatibility tracker.

Some questions worth discussing:

  • How do we handle tightening loose numpy pins in packages ( https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/issues/516 )?
  • Do we want to keep building for NumPy 1 & 2 at the same time?
  • What timeline do we want to use for adding NumPy 2?
  • What timeline do we want to use for dropping NumPy 1?
  • How should this be coordinated around other major migrations (like Python 3.12 or a new Boost)?
  • Numpy now builds for the "oldest-available-C-API" by default, i.e. we could build for our current 1.22 C-API using numpy 1.25; we need to adapt to this in any case, at the latest when our lower bound becomes 1.25 (see https://github.com/conda-forge/conda-forge-pinning-feedstock/issues/4816)?
  • Anything else we should discuss?

Todos:

  • [x] https://github.com/conda-forge/numpy-feedstock/pull/313
  • [x] https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/pull/728
  • [x] https://github.com/regro/cf-scripts/issues/2469
  • [x] https://github.com/regro/cf-scripts/pull/2470
  • [x] https://github.com/conda-forge/conda-smithy/issues/1911
  • [x] https://github.com/conda-forge/staged-recipes/pull/26188
  • [x] https://github.com/conda-forge/numpy-feedstock/pull/314
  • [x] https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/5790
  • [x] https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/5851
  • [ ] https://github.com/conda-forge/conda-forge.github.io/issues/2156

cc @conda-forge/core

jakirkham avatar Aug 22 '23 21:08 jakirkham

See also https://github.com/conda-forge/conda-forge-pinning-feedstock/issues/4816

h-vetinari avatar Aug 22 '23 21:08 h-vetinari

Can we merge these two issues just to make it easier to track them?

ocefpaf avatar Aug 22 '23 21:08 ocefpaf

The issue Axel raises seems like a subpoint of this issue (depending on what we decide). Namely do we want to opt-in to this newer/slimmer ABI and how does that fit into NumPy 2

jakirkham avatar Aug 22 '23 21:08 jakirkham

Sure. IMO Axel issue is a subset of this one. I don't have strong opinions on which one to keep, or if you want to keep both, but I also don't want to get lost on two mega-threads :grimacing:

ocefpaf avatar Aug 22 '23 21:08 ocefpaf

Added Axel's item to the list above

jakirkham avatar Aug 22 '23 21:08 jakirkham

Handling the ABI is the key point here (that and current packages missing a <2). I updated the added item because the summary was not accurate.

Normally I'd say we do a dual migration (keep 1.x; add 2.0), but numpy has around 5000** dependents in conda-forge, so that would be a pretty substantial CI impact, especially if it takes a while to drop 1.x.

**

>mamba repoquery whoneeds numpy -c conda-forge -p linux-64 > tmp
># edit to remove header
>python
>>> q = open("tmp", "r").readlines()
>>> p = {x.strip().split(" ")[0] for x in q} - {""}
>>> len(p)
4898

Obviously not all of them are compiling against numpy, but still...

h-vetinari avatar Aug 22 '23 22:08 h-vetinari

I updated the added item because the summary was not accurate.

Thanks Axel! 🙏

Anyone should feel free to update the issue as needed 🙂

jakirkham avatar Aug 22 '23 22:08 jakirkham

Following up on our discussion earlier about improving the visibility of NPY_FEATURE_VERSION, started this NumPy PR ( https://github.com/numpy/numpy/pull/24861 ) to message how the value is set

Also include a note about one approach we might take to ensure that value is embedded in built binaries. Though maybe there are better approaches for that portion

jakirkham avatar Oct 04 '23 18:10 jakirkham

There should now be a string baked into binaries built with NumPy to notate what kind of NumPy compatibility they have

https://github.com/numpy/numpy/pull/25948

jakirkham avatar Mar 07 '24 19:03 jakirkham

It is worth noting that thanks to Axel and others we now have NumPy 2.0.0rc1 packages: https://github.com/conda-forge/numpy-feedstock/issues/311

Also ecosystem support of NumPy 2 is being tracked in this issue Ralf opened: https://github.com/numpy/numpy/issues/26191

We are now in a good spot to start testing building packages with NumPy 2

jakirkham avatar Apr 01 '24 21:04 jakirkham

I discussed this with @rgommers recently and one important point that he brought up is the situation with pin_compatible, which we'll have to fix as part of any migration effort, probably with a piggyback migrator, since we'll need to rewrite the recipes.

In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible under run:. However this will be doubly incorrect in the new setup - for one as our NPY_FEATURE_VERSION (which forms the lower bound) will be lower than the one at build time, and second because the upper bound should be something like <2.{{ numpy.split(".")[1] | int + 3 }} (for a project that's free of deprecation warnings; anything else might be deprecated in 2.{N + 1} and removed after two releases in 2.{N + 3}).

h-vetinari avatar Apr 01 '24 22:04 h-vetinari

In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export [...]

Of course, if there's appetite for a split into libnumpy (with a run-export) and numpy (the python bits), that might be worth a thought as well. But then even moreso, we'd need a piggyback.

h-vetinari avatar Apr 01 '24 23:04 h-vetinari

Of course, if there's appetite for a split into libnumpy (with a run-export) and numpy (the python bits), that might be worth a thought as well.

That doesn't sound good to me as a custom conda-forge split. If we want anything like that, let's do this properly and create a numpy-headers package that's officially supported by NumPy and that can be used by anyone (unless they need a static library or numpy.f2py) with a build-time dependency on the NumPy C API. We actually discussed this in a NumPy community meeting, and it seems feasible.

rgommers avatar Apr 02 '24 08:04 rgommers

In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible under run:

We do have a run_export on numpy.

isuruf avatar Apr 02 '24 14:04 isuruf

Yeah... Clearly I shouldn't be writing these comments from a phone 😅

I misremembered that part, but in that case the task becomes easier - we just set up the right run export in numpy itself, and then remove pin_compatible in feedstocks that compile against numpy. Right?

h-vetinari avatar Apr 02 '24 17:04 h-vetinari

Another question we have to answer soon: what mechanism do we want to use for setting NPY_FEATURE_VERSION... Perhaps the easiest would be an activation script in numpy, but that's a fairly big hammer, as it persists beyond build time and into all user environments.

Right now I'm thinking of setting NPY_FEATURE_VERSION in the global pinning (cleanly overrideable per feedstock where necessary), and then using that in conda-forge-ci-setup to populate the environment variable that numpy will pick up (and if necessary, in the compiler activation feedstocks, e.g. for CFLAGS).

The only issue there is that the run-export on numpy is not dynamic, in the sense that it gets fixed to the value of NPY_FEATURE_VERSION at the build time of numpy, and not the (potentially different) one in play when building something else against numpy. Unless I'm overlooking something, we'd therefore need to transform rather than remove the existing uses of pin_compatible("numpy") with something like

- {{ pin_compatible("numpy", lower_bound=NPY_FEATURE_VERSION) }}

while the upper bound (<2.{N + 3}) would be set by the run-export on numpy.

h-vetinari avatar Apr 02 '24 22:04 h-vetinari

What if we had something like...?

{% set version = "2.0.0" %}

package:
  name: numpy
  version: {{ version }}

...

build:
  ...
  run_exports:
    - {{ pin_subpackage("numpy", lower_bound=os.environ.get("NPY_FEATURE_VERSION", version)) }}

That way we can defer this environment variable setting to packages

If they don't set something, we can provide a sensible default (either version or something else we decide)

We could also consider whether conda-build could allow NPY_FEATURE_VERSION to be a pass through environment variable or if we handle that within conda-forge with some recipe changes to pass it through ourselves. This would let us set us use a global setting (as you suggest)

jakirkham avatar Apr 03 '24 04:04 jakirkham

I don't think this type of NPY_FEATURE_VERSION setting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for. So all produced extension modules will work with all possible numpy versions that can actually be installed.

Hence, doing nothing should be the right default here, trying to change it from NumPy's default will likely only be the cause of extra complexity/confusion, and perhaps bugs.

rgommers avatar Apr 03 '24 05:04 rgommers

That way we can defer this environment variable setting to packages

I'd be surprised if it works like that. AFAIU, that os.environ call will be resolved while building numpy.

I don't think this type of NPY_FEATURE_VERSION setting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for.

Leaving aside NEP29, this is a quantity we have to be able to control IMO. Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks). I don't see how we can reasonably avoid making NPY_FEATURE_VERSION explicit in conda-forge in some way.

h-vetinari avatar Apr 03 '24 06:04 h-vetinari

I very well could be wrong. It is easy to test

Really we just need more ideas to sample from. It's more important that we have a large variety before selecting one. So feel free to propose more

jakirkham avatar Apr 03 '24 06:04 jakirkham

Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks).

It won't be wrong. The metadata that is in the upstream releases (i.e. the dependencies key in pyproject.toml files) is going to be updated by package authors, and that's the thing that should be relied on by conda-forge builds. The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.

rgommers avatar Apr 03 '24 06:04 rgommers

I'm not sure I follow. Say a project has numpy >=1.24,<2.3 in its pyproject.toml, is there some sort of hook that populates NPY_FEATURE_VERSION to 1.24? If so, how would that constraint arrive in the metadata of the packages we build?

Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.

Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly (unless they really need to), or extremely slowly (i.e. going back to 1.19 adds about 2 years on top on top of what NEP29 foresees w.r.t. being able to use a given ABI feature).

The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.

In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).

h-vetinari avatar Apr 03 '24 08:04 h-vetinari

Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.

This. And it's not doubtful, it is guaranteed to work. The whole point is to take away build-time version as a thing that dynamically overrides the declares runtime dependency range.

Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly

No, that does not make sense. If a package has numpy>=x.y in its constraints, you cannot just ignore that. The package author bumped the lower version for some reason, so if you tweak the metadata to say numpy>=x.y-N instead, you will allow a broken combination of packages.

In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).

No, and no. The lower bound is whatever dependencies=numpy... in pyproject.toml says, or it's a bug in the package (even if the package internally sets NPY_FEATURE_VERSION, which should be quite rare).

What the conda-forge tooling should do is check that the meta.yaml and pyproject.toml metadata is consistent - and I think that is a feature already present for regular Python packages. I.e., start treating numpy like any other Python package when building against numpy 2.x.

rgommers avatar Apr 03 '24 08:04 rgommers

No, that does not make sense.

You chopped off the part of my quote that accounts for the scenario you describe.

The lower bound is whatever dependencies=numpy... in pyproject.toml says

I'm not saying we should disregard runtime constraints. I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.

What the conda-forge tooling should do is check that the environment.yml and pyproject.toml metadata is consistent

I don't see this happening soon enough to be available for the 2.0 transition, it would need work on conda-build AFAICT.

and I think that is a feature already present for regular Python packages. I.e., start treating numpy like any other Python package when building against numpy 2.x.

I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages? Anything else that has a run-export (to my knowledge) uses the build-time version as a lower bound. That's precisely the issue that requires attention here, because of the very unusual situation how building against numpy 2.0 produces something compatible with >=1.x.

h-vetinari avatar Apr 03 '24 09:04 h-vetinari

I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.

What I am trying to explain is that that stacking is not doing anything, because

  • numpy will never set the feature version in a way that allows for this to have any effect, and
  • if a package does this internally by setting NPY_FEATURE_VERSION to something higher than what it says in its pyproject.toml, that's a bug in the package and should be fixed there by fixing its dependencies= metadata.

I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages?

Ah, I did mean this since I remember dependencies being flagged on PRs - but it may not be ready indeed, since it's marked as experimental:

image image

So it's still mostly manual then, depending on the feedstock maintainers to keep pyproject.toml and meta.yaml in sync?

rgommers avatar Apr 03 '24 12:04 rgommers

For clarification, should environment.yml here be the recipe's meta.yaml? Or do you mean something else Ralf?

There are different levels of bot inspection or automation. However this is opt-in at this point. It is seeing some use in conda-forge, but we are probably not at the point where we could turn this on by default. Though that's a separate discussion I think

jakirkham avatar Apr 03 '24 18:04 jakirkham

For clarification, should environment.yml here be the recipe's meta.yaml? Or do you mean something else Ralf?

Yes indeed. General tiredness 🤦🏼. Editing my comment to say meta.yaml to avoid further confusion.

rgommers avatar Apr 03 '24 18:04 rgommers

Thanks Ralf! 🙏 All good. Appreciate hearing your insights and having your support here 🙂

Can imagine there are a lot of spinning plates with this work 😅

jakirkham avatar Apr 03 '24 20:04 jakirkham

What I am trying to explain is that that stacking is not doing anything, because

  • numpy will never set the feature version in a way that allows for this to have any effect, and

I'm still not sure we're speaking the same language here. We'll have numpy 2.0 in the host environment, and we need to create a lower bound for the numpy run-export, i.e. numpy >=1.x. Where should this x come from?

We really need to have correct metadata, because the solver will flee into the past if we don't close off incorrect avenues. For example, we're still building for py38, and have numpy 1.18 builds for that. Anything built against numpy 2.0 (assuming I undestood correctly that the default ABI level is going to be 1.19) needs to be impossible to install with 1.18, hence needs the right constraints.

You're saying (effectively) that people's numpy dependencies= in their pyproject.toml are always going to be higher than the default NPY_FEATURE_VERSION, and that's an assumption that I don't think will hold. As the first (I swear...) random example I looked at, cvxpy HEAD uses numpy >=1.15. So we need to deal with this.

And once we deal with this, it's IMO not a good idea to just define a fixed run-export in numpy itself, because packages will want to override this - for example, if they need an ABI feature that's newer than the default of 1.19 for whatever reason. Hence why I'm tending towards

- {{ pin_compatible("numpy", lower_bound=NPY_FEATURE_VERSION) }}

per feedstock, with NPY_FEATURE_VERSION part of the global pinning (and overrideable per feedstock) instead of numpy.

h-vetinari avatar Apr 09 '24 08:04 h-vetinari

Anything built against numpy 2.0 (assuming I undestood correctly that the default ABI level is going to be 1.19) needs to be impossible to install with 1.18, hence needs the right constraints.

It is impossible. The lowest Python version supported by NumPy 2.0 is 3.9. There is no numpy package for 1.18 on either conda-forge or on PyPI:

$ mamba search numpy=1.18.5
Loading channels: done
# Name                       Version           Build  Channel             
numpy                         1.18.5  py36h7314795_0  conda-forge         
numpy                         1.18.5  py36he0f5f23_0  conda-forge         
numpy                         1.18.5  py37h8960a57_0  conda-forge         
numpy                         1.18.5  py38h8854b6b_0  conda-forge

So yes, it is by design of how numpy sets the default targeted C API impossible to get an incompatible combination here, even in an example like cvxpy where they set their lower bound to 1.15 (that could well be valid if they still support Python 3.7, no idea).

For example, we're still building for py38

There cannot be a numpy 2.0 package for py38, so this isn't relevant.

So we need to deal with this.

I'm still convinced that this is not true - it does not need dealing with explicitly in conda-forge recipes because it cannot go wrong. It's perfectly okay for the conda-forge cvxpy version to have >=1.15 in its metadata, that will work for any actual numpy package built by conda-forge.

rgommers avatar Apr 10 '24 06:04 rgommers