python310Packages.jax: 0.3.6 -> 0.3.15
Description of changes
Upgrades jax and jaxlib to 0.3.15.
~Still a draft because this depends on https://github.com/NixOS/nixpkgs/pull/183052.~
Note: This marks the darwin build as broken because jax>=0.3.14 sets macos_minimum_os=10.14, but as far as I can tell 10.12 is the latest supported for x86_64 in nixpkgs (relevant issue: https://github.com/NixOS/nixpkgs/issues/101229).
Things done
- Built on platform(s)
- [x] x86_64-linux
- [ ] aarch64-linux
- [ ] x86_64-darwin
- [ ] aarch64-darwin
- [ ] For non-Linux: Is
sandbox = trueset innix.conf? (See Nix manual) - [ ] Tested, as applicable:
- NixOS test(s) (look inside nixos/tests)
- and/or package tests
- or, for functions and "core" functionality, tests in lib/tests or pkgs/test
- made sure NixOS tests are linked to the relevant packages
- [ ] Tested compilation of all packages that depend on this change using
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage - [x] Tested basic functionality of all binary files (usually in
./result/bin/) -
22.11 Release Notes (or backporting 22.05 Release notes)
- [ ] (Package updates) Added a release notes entry if the change is major or breaking
- [ ] (Module updates) Added a release notes entry if the change is significant
- [ ] (Module addition) Added a release notes entry if adding a new NixOS module
- [ ] (Release notes changes) Ran
nixos/doc/manual/md-to-db.shto update generated release notes
- [x] Fits CONTRIBUTING.md.
Status update: I'm stuck trying to get this to build on x86_64 darwin, getting errors like the following:
external/org_tensorflow/tensorflow/stream_executor/allocator_stats.h:45:3: error: no template named 'optional' in namespace 'std'; did you mean 'absl::optional'?
I'm less familiar with darwin, but it seems like this might be an issue with nixpkgs using an unsupported darwin SDK version for x86_64. Since v0.3.14, jax has bumped macos_minimum_os to 10.14, but as far as I can tell nixpkgs currently only supports 10.12 for x86_64 (relevant issue: https://github.com/NixOS/nixpkgs/issues/101229).
For now I'm going to mark this broken on darwin since I haven't been able to figure out a workaround.
The darwin build might be ok after https://github.com/NixOS/nixpkgs/pull/184395, since that updates the package to use SDK 11.0, which is available in nixpkgs, it's just not the default one.
The darwin build might be ok after https://github.com/NixOS/nixpkgs/pull/184395, since that updates the package to use SDK 11.0, which is available in nixpkgs, it's just not the default one.
Once we get https://github.com/NixOS/nixpkgs/pull/184395 merged, we should def rebase this on top of that
It looks like the ofborg build is failing with
error: building of '/nix/store/mb0mnbqh2n6lh55krxahirdcqfn06crw-python3.10-jax-0.3.15.drv!out' from .drv file timed out after 3600 seconds
Is there a way to increase this timeout?
Once we get #184395 merged, we should def rebase this on top of that
now merged! @mcwitt could you please rebase on top of that and then we can get this PR moving?
jax 0.3.16 is available now https://github.com/google/jax/releases/tag/jax-v0.3.16
now merged! @mcwitt could you please rebase on top of that and then we can get this PR moving?
Just FYI, I'll be traveling for a few days and won't have a chance to work on this again before next week. If anyone needs it sooner, feel free to take over! Otherwise I should have a chance to rebase and upgrade to 0.3.16 next week.
Just FYI, I'll be traveling for a few days and won't have a chance to work on this again before next week. If anyone needs it sooner, feel free to take over! Otherwise I should have a chance to rebase and upgrade to 0.3.16 next week.
No worries, safe travels @mcwitt! There's no rush AFAIK
Update: looks like jax is not a required dependency of etils anyhow?
https://github.com/NixOS/nixpkgs/blob/6b34476e8009bbc4eeab0b0b52065c5ea6cf4537/pkgs/development/python-modules/etils/default.nix#L45-L59
Update: looks like jax is not a required dependency of etils anyhow?
@samuela correct, but jax is a test dependency of etils: https://github.com/NixOS/nixpkgs/blob/1b74710c69c2ba3b693ebae342a4bfdfd7a0c1e5/pkgs/development/python-modules/etils/default.nix#L75
This is not actually an issue right now as the etils tests are currently disabled (until https://github.com/NixOS/nixpkgs/pull/186690 is merged). I get your point that it seems like this workaround should've been added to etils instead of here. So maybe best to remove the workaround here and instead break the cycle in etils when we enable tests there?
I went ahead and reverted the change that disabled checks and added to passthru.tests. (In case it's helpful when reviewing, the unsquashed branch is here: https://github.com/mcwitt/nixpkgs/tree/upgrade-jax)
Result of nixpkgs-review pr 183051 run on x86_64-linux 1
32 packages failed to build:
- python310Packages.aeppl
- python310Packages.aesara
- python310Packages.chex
- python310Packages.dalle-mini
- python310Packages.distrax
- python310Packages.dm-haiku
- python310Packages.dm-sonnet
- python310Packages.elegy
- python310Packages.flax
- python310Packages.jmp
- python310Packages.optax
- python310Packages.pymc
- python310Packages.rlax
- python310Packages.tensorflow-datasets
- python310Packages.treex
- python310Packages.vqgan-jax
- python39Packages.aeppl
- python39Packages.aesara
- python39Packages.chex
- python39Packages.dalle-mini
- python39Packages.distrax
- python39Packages.dm-haiku
- python39Packages.dm-sonnet
- python39Packages.elegy
- python39Packages.flax
- python39Packages.jmp
- python39Packages.optax
- python39Packages.pymc
- python39Packages.rlax
- python39Packages.tensorflow-datasets
- python39Packages.treex
- python39Packages.vqgan-jax
16 packages built:
- python310Packages.arviz
- python310Packages.augmax
- python310Packages.jax
- python310Packages.jaxlib (python310Packages.jaxlib-build ,python310Packages.jaxlibWithoutCuda)
- python310Packages.jaxlibWithCuda
- python310Packages.numpyro
- python310Packages.objax
- python310Packages.treeo
- python39Packages.arviz
- python39Packages.augmax
- python39Packages.jax
- python39Packages.jaxlib (python39Packages.jaxlib-build ,python39Packages.jaxlibWithoutCuda)
- python39Packages.jaxlibWithCuda
- python39Packages.numpyro
- python39Packages.objax
- python39Packages.treeo
Result of nixpkgs-review pr 183051 run on x86_64-linux 1
I skimmed over the breakages. Many seem to be caused by the following:
- chex: broken by upstream removal of
tree_multimap. Fixed by upgrading to latest (0.1.4) - jmp: broken by upstream removal of
tree_multimap. Fixed by upgrading to latest (https://github.com/deepmind/jmp/commit/260e5ba01f46b10c579a61393e6c7e546aeae93e) - distrax: many tests fail; upgrade to latest not possible since it adds a
numpy<1.23version bound - tensorflow-datasets: long-running test suite; not sure if it'll be fixed by the above fixes or a version bump
The others seem to be broken by unrelated issues (mostly incompatible scipy and rich versions).
Unfortunately rebasing with https://github.com/NixOS/nixpkgs/pull/184395 merged hasn't magically fixed the darwin build.
- ~Bazel fails to find
libtool.~ I think this is fixed by addingcctoolstonativeBuildInputs(https://github.com/mcwitt/nixpkgs/commit/4f8786c6d02dd1690ee74d72610c4c9ceb9ec23a) -
ERROR: /private/tmp/nix-build-bazel-build-jaxlib-0.3.15.drv-0/output/external/llvm-project/mlir/BUILD.bazel:6549:11: Compiling mlir/lib/TableGen/Argument.cpp failed: undeclared inclusion(s) in rule '@llvm-project//mlir:TableGen'(log). @uri-canva do you have any insight here?
I have not seen the undeclared inclusion(s) in rule in my testing. Did you test if jaxlib works on its own without this branch? Maybe it's already broken in master.
It shows as successful 9 hours ago: https://hydra.nixos.org/eval/1777705?filter=jaxlib&compare=1777668&full=
Oh I see, it's probably some new transitive dependency from the version upgrade.
Seems to be a known issue with bazel: https://github.com/tensorflow/tensorflow/issues/55563 https://github.com/bazelbuild/bazel/issues/15359
The workarounds might be a bit hard for us though, not sure how well gcc works on darwin, but we can try.
I skimmed over the breakages. Many seem to be caused by the following:
Thanks for digging into these failures @mcwitt!
-
tree_multimapissues (affecting chex, jmp): I'm ok letting these slide for now as long as someone promises to come back and do the necessary upgrades after this PR is merged. - distrax: appears to be a pre-existing failure on master: https://hydra.nixos.org/build/188060343
- tensorflow-datasets: this is an existing failures on master: https://github.com/samuela/nixpkgs-upkeep/runs/8006479894?check_suite_focus=true
Unfortunately rebasing with https://github.com/NixOS/nixpkgs/pull/184395 merged hasn't magically fixed the darwin build.
Just to confirm: @mcwitt are you building on x86_64-darwin?
Just to confirm: @mcwitt are you building on x86_64-darwin?
@samuela yes, the undeclared inclusion(s) error occurs for x86_64-darwin. Here's the ofborg build log: https://logs.nix.ci/?key=nixos/nixpkgs.183051&attempt_id=fe336da2-c899-425e-b4fc-a0e9328d45b0
It appears that python310Packages.jax.x86_64-darwin is also failing on master but due to a different failure: https://hydra.nixos.org/build/188060892/nixlog/1
Is there any hope trying to fix or patch the x86_64-darwin bug ourselves? It appears to be a quite hairy issue in bazel IIUC... Perhaps we should also raise it with the JAX folks?
I'm ok letting these slide for now as long as someone promises to come back and do the necessary upgrades after this PR is merged.
@samuela I'd be happy to handle upgrading chex and jmp after this is merged (already have the updates building in local branches so it shouldn't be too much work 🤞 )
@mcwitt @uri-canva How do ya'll feel about merging this as is, despite the fact that it breaks the x86_64-darwin build? On the one hand I hate to break the build... OTOH a) this breakage isn't our fault, b) I don't want PRs to get stuck forever. wdyt?
Yeah I think it's ok, I only just fixed the darwin build, and I'm not sure how long it was broken before that, so I don't think there would be anyone depending on it.
Let's mark it as broken and add a comment with all the context / links to the issues for future reference. That way when someone looks at it in the future they won't have to start figuring it out from scratch.
Yeah, I def don't want you to think that the x86_64-darwin work was for nothing @uri-canva! We should def commit to getting darwin support back up and running as soon as upstream progress on this issue allows.
so we can merge this now or not?
I think we can!