nixpkgs icon indicating copy to clipboard operation
nixpkgs copied to clipboard

python310Packages.jax: 0.3.6 -> 0.3.15

Open mcwitt opened this issue 3 years ago • 4 comments

Description of changes

Upgrades jax and jaxlib to 0.3.15.

~Still a draft because this depends on https://github.com/NixOS/nixpkgs/pull/183052.~

Note: This marks the darwin build as broken because jax>=0.3.14 sets macos_minimum_os=10.14, but as far as I can tell 10.12 is the latest supported for x86_64 in nixpkgs (relevant issue: https://github.com/NixOS/nixpkgs/issues/101229).

Things done
  • Built on platform(s)
    • [x] x86_64-linux
    • [ ] aarch64-linux
    • [ ] x86_64-darwin
    • [ ] aarch64-darwin
  • [ ] For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • [ ] Tested, as applicable:
  • [ ] Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • [x] Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • [ ] (Package updates) Added a release notes entry if the change is major or breaking
    • [ ] (Module updates) Added a release notes entry if the change is significant
    • [ ] (Module addition) Added a release notes entry if adding a new NixOS module
    • [ ] (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • [x] Fits CONTRIBUTING.md.

mcwitt avatar Jul 27 '22 03:07 mcwitt

Status update: I'm stuck trying to get this to build on x86_64 darwin, getting errors like the following:

external/org_tensorflow/tensorflow/stream_executor/allocator_stats.h:45:3: error: no template named 'optional' in namespace 'std'; did you mean 'absl::optional'?

I'm less familiar with darwin, but it seems like this might be an issue with nixpkgs using an unsupported darwin SDK version for x86_64. Since v0.3.14, jax has bumped macos_minimum_os to 10.14, but as far as I can tell nixpkgs currently only supports 10.12 for x86_64 (relevant issue: https://github.com/NixOS/nixpkgs/issues/101229).

For now I'm going to mark this broken on darwin since I haven't been able to figure out a workaround.

mcwitt avatar Aug 03 '22 15:08 mcwitt

The darwin build might be ok after https://github.com/NixOS/nixpkgs/pull/184395, since that updates the package to use SDK 11.0, which is available in nixpkgs, it's just not the default one.

uri-canva avatar Aug 06 '22 00:08 uri-canva

The darwin build might be ok after https://github.com/NixOS/nixpkgs/pull/184395, since that updates the package to use SDK 11.0, which is available in nixpkgs, it's just not the default one.

Once we get https://github.com/NixOS/nixpkgs/pull/184395 merged, we should def rebase this on top of that

samuela avatar Aug 06 '22 03:08 samuela

It looks like the ofborg build is failing with

error: building of '/nix/store/mb0mnbqh2n6lh55krxahirdcqfn06crw-python3.10-jax-0.3.15.drv!out' from .drv file timed out after 3600 seconds

Is there a way to increase this timeout?

mcwitt avatar Aug 13 '22 14:08 mcwitt

Once we get #184395 merged, we should def rebase this on top of that

now merged! @mcwitt could you please rebase on top of that and then we can get this PR moving?

samuela avatar Aug 18 '22 02:08 samuela

jax 0.3.16 is available now https://github.com/google/jax/releases/tag/jax-v0.3.16

onny avatar Aug 18 '22 11:08 onny

now merged! @mcwitt could you please rebase on top of that and then we can get this PR moving?

Just FYI, I'll be traveling for a few days and won't have a chance to work on this again before next week. If anyone needs it sooner, feel free to take over! Otherwise I should have a chance to rebase and upgrade to 0.3.16 next week.

mcwitt avatar Aug 19 '22 15:08 mcwitt

Just FYI, I'll be traveling for a few days and won't have a chance to work on this again before next week. If anyone needs it sooner, feel free to take over! Otherwise I should have a chance to rebase and upgrade to 0.3.16 next week.

No worries, safe travels @mcwitt! There's no rush AFAIK

samuela avatar Aug 21 '22 20:08 samuela

Update: looks like jax is not a required dependency of etils anyhow?

https://github.com/NixOS/nixpkgs/blob/6b34476e8009bbc4eeab0b0b52065c5ea6cf4537/pkgs/development/python-modules/etils/default.nix#L45-L59

samuela avatar Aug 23 '22 17:08 samuela

Update: looks like jax is not a required dependency of etils anyhow?

@samuela correct, but jax is a test dependency of etils: https://github.com/NixOS/nixpkgs/blob/1b74710c69c2ba3b693ebae342a4bfdfd7a0c1e5/pkgs/development/python-modules/etils/default.nix#L75

This is not actually an issue right now as the etils tests are currently disabled (until https://github.com/NixOS/nixpkgs/pull/186690 is merged). I get your point that it seems like this workaround should've been added to etils instead of here. So maybe best to remove the workaround here and instead break the cycle in etils when we enable tests there?

I went ahead and reverted the change that disabled checks and added to passthru.tests. (In case it's helpful when reviewing, the unsquashed branch is here: https://github.com/mcwitt/nixpkgs/tree/upgrade-jax)

mcwitt avatar Aug 23 '22 22:08 mcwitt

Result of nixpkgs-review pr 183051 run on x86_64-linux 1

32 packages failed to build:
  • python310Packages.aeppl
  • python310Packages.aesara
  • python310Packages.chex
  • python310Packages.dalle-mini
  • python310Packages.distrax
  • python310Packages.dm-haiku
  • python310Packages.dm-sonnet
  • python310Packages.elegy
  • python310Packages.flax
  • python310Packages.jmp
  • python310Packages.optax
  • python310Packages.pymc
  • python310Packages.rlax
  • python310Packages.tensorflow-datasets
  • python310Packages.treex
  • python310Packages.vqgan-jax
  • python39Packages.aeppl
  • python39Packages.aesara
  • python39Packages.chex
  • python39Packages.dalle-mini
  • python39Packages.distrax
  • python39Packages.dm-haiku
  • python39Packages.dm-sonnet
  • python39Packages.elegy
  • python39Packages.flax
  • python39Packages.jmp
  • python39Packages.optax
  • python39Packages.pymc
  • python39Packages.rlax
  • python39Packages.tensorflow-datasets
  • python39Packages.treex
  • python39Packages.vqgan-jax
16 packages built:
  • python310Packages.arviz
  • python310Packages.augmax
  • python310Packages.jax
  • python310Packages.jaxlib (python310Packages.jaxlib-build ,python310Packages.jaxlibWithoutCuda)
  • python310Packages.jaxlibWithCuda
  • python310Packages.numpyro
  • python310Packages.objax
  • python310Packages.treeo
  • python39Packages.arviz
  • python39Packages.augmax
  • python39Packages.jax
  • python39Packages.jaxlib (python39Packages.jaxlib-build ,python39Packages.jaxlibWithoutCuda)
  • python39Packages.jaxlibWithCuda
  • python39Packages.numpyro
  • python39Packages.objax
  • python39Packages.treeo

samuela avatar Aug 24 '22 01:08 samuela

Result of nixpkgs-review pr 183051 run on x86_64-linux 1

I skimmed over the breakages. Many seem to be caused by the following:

  • chex: broken by upstream removal of tree_multimap. Fixed by upgrading to latest (0.1.4)
  • jmp: broken by upstream removal of tree_multimap. Fixed by upgrading to latest (https://github.com/deepmind/jmp/commit/260e5ba01f46b10c579a61393e6c7e546aeae93e)
  • distrax: many tests fail; upgrade to latest not possible since it adds a numpy<1.23 version bound
  • tensorflow-datasets: long-running test suite; not sure if it'll be fixed by the above fixes or a version bump

The others seem to be broken by unrelated issues (mostly incompatible scipy and rich versions).

mcwitt avatar Aug 24 '22 23:08 mcwitt

Unfortunately rebasing with https://github.com/NixOS/nixpkgs/pull/184395 merged hasn't magically fixed the darwin build.

  • ~Bazel fails to find libtool.~ I think this is fixed by adding cctools to nativeBuildInputs (https://github.com/mcwitt/nixpkgs/commit/4f8786c6d02dd1690ee74d72610c4c9ceb9ec23a)
  • ERROR: /private/tmp/nix-build-bazel-build-jaxlib-0.3.15.drv-0/output/external/llvm-project/mlir/BUILD.bazel:6549:11: Compiling mlir/lib/TableGen/Argument.cpp failed: undeclared inclusion(s) in rule '@llvm-project//mlir:TableGen' (log). @uri-canva do you have any insight here?

mcwitt avatar Aug 25 '22 00:08 mcwitt

I have not seen the undeclared inclusion(s) in rule in my testing. Did you test if jaxlib works on its own without this branch? Maybe it's already broken in master.

uri-canva avatar Aug 25 '22 00:08 uri-canva

It shows as successful 9 hours ago: https://hydra.nixos.org/eval/1777705?filter=jaxlib&compare=1777668&full=

uri-canva avatar Aug 25 '22 00:08 uri-canva

Oh I see, it's probably some new transitive dependency from the version upgrade.

uri-canva avatar Aug 25 '22 00:08 uri-canva

Seems to be a known issue with bazel: https://github.com/tensorflow/tensorflow/issues/55563 https://github.com/bazelbuild/bazel/issues/15359

uri-canva avatar Aug 25 '22 00:08 uri-canva

The workarounds might be a bit hard for us though, not sure how well gcc works on darwin, but we can try.

uri-canva avatar Aug 25 '22 00:08 uri-canva

I skimmed over the breakages. Many seem to be caused by the following:

Thanks for digging into these failures @mcwitt!

  • tree_multimap issues (affecting chex, jmp): I'm ok letting these slide for now as long as someone promises to come back and do the necessary upgrades after this PR is merged.
  • distrax: appears to be a pre-existing failure on master: https://hydra.nixos.org/build/188060343
  • tensorflow-datasets: this is an existing failures on master: https://github.com/samuela/nixpkgs-upkeep/runs/8006479894?check_suite_focus=true

Unfortunately rebasing with https://github.com/NixOS/nixpkgs/pull/184395 merged hasn't magically fixed the darwin build.

Just to confirm: @mcwitt are you building on x86_64-darwin?

samuela avatar Aug 25 '22 01:08 samuela

Just to confirm: @mcwitt are you building on x86_64-darwin?

@samuela yes, the undeclared inclusion(s) error occurs for x86_64-darwin. Here's the ofborg build log: https://logs.nix.ci/?key=nixos/nixpkgs.183051&attempt_id=fe336da2-c899-425e-b4fc-a0e9328d45b0

mcwitt avatar Aug 25 '22 14:08 mcwitt

It appears that python310Packages.jax.x86_64-darwin is also failing on master but due to a different failure: https://hydra.nixos.org/build/188060892/nixlog/1

samuela avatar Aug 25 '22 19:08 samuela

Is there any hope trying to fix or patch the x86_64-darwin bug ourselves? It appears to be a quite hairy issue in bazel IIUC... Perhaps we should also raise it with the JAX folks?

samuela avatar Aug 25 '22 19:08 samuela

I'm ok letting these slide for now as long as someone promises to come back and do the necessary upgrades after this PR is merged.

@samuela I'd be happy to handle upgrading chex and jmp after this is merged (already have the updates building in local branches so it shouldn't be too much work 🤞 )

mcwitt avatar Aug 25 '22 23:08 mcwitt

@mcwitt @uri-canva How do ya'll feel about merging this as is, despite the fact that it breaks the x86_64-darwin build? On the one hand I hate to break the build... OTOH a) this breakage isn't our fault, b) I don't want PRs to get stuck forever. wdyt?

samuela avatar Aug 29 '22 23:08 samuela

Yeah I think it's ok, I only just fixed the darwin build, and I'm not sure how long it was broken before that, so I don't think there would be anyone depending on it.

uri-canva avatar Aug 30 '22 00:08 uri-canva

Let's mark it as broken and add a comment with all the context / links to the issues for future reference. That way when someone looks at it in the future they won't have to start figuring it out from scratch.

uri-canva avatar Aug 30 '22 00:08 uri-canva

Yeah, I def don't want you to think that the x86_64-darwin work was for nothing @uri-canva! We should def commit to getting darwin support back up and running as soon as upstream progress on this issue allows.

samuela avatar Aug 30 '22 01:08 samuela

so we can merge this now or not?

SuperSandro2000 avatar Aug 30 '22 16:08 SuperSandro2000

I think we can!

samuela avatar Aug 30 '22 17:08 samuela