xla icon indicating copy to clipboard operation
xla copied to clipboard

Pytorch tests fail due to outdated JAX version

Open pgmoka opened this issue 5 months ago • 8 comments

PyTorch tests seem to be failing due to PyTorchXLA depending on an older version of Jax. Specifically they are seeing:

error: jaxlib 0.6.2 is installed but jaxlib<=0.7.0,>=0.7.0 is required by {'jax'}

We probably should see into updating https://github.com/pytorch/xla/blob/master/setup.py#L118-L123 to a more current version.

If this is an issue with the 2.8 cut, we should backport the update to it (see https://github.com/pytorch/xla/issues/9433 for the process)

pgmoka avatar Jul 22 '25 21:07 pgmoka

Once resolved, we have been asked to close https://github.com/pytorch/pytorch/issues/158876

pgmoka avatar Jul 22 '25 22:07 pgmoka

Other tests seem to be effected: https://github.com/pytorch/pytorch/actions/runs/16457516513/job/46518864660

pgmoka avatar Jul 23 '25 16:07 pgmoka

Fyi, this PR will need to be reverted once the XLA tests are fixed: https://github.com/pytorch/pytorch/pull/159272

ZainRizvi avatar Jul 29 '25 20:07 ZainRizvi

I believe the issue has been resolved in https://github.com/pytorch/xla/pull/9565. Based on recent runs, that seems to be true from the pytorch more recent test runs:

  • https://github.com/pytorch/pytorch/actions/runs/17406954070
  • https://github.com/pytorch/pytorch/actions/runs/17406351178

pgmoka avatar Sep 02 '25 17:09 pgmoka

@ZainRizvi Please let me know if this is still impacting tests

pgmoka avatar Sep 02 '25 17:09 pgmoka

Hi @pgmoka , you can verify how the tests are doing by following any one of below two steps:

  1. Check against the main branch on HUD. Uncheck the "Hide unstable jobs" option to be able to see unstable jobs, and then see if the xla jobs that were moved to unstable in https://github.com/pytorch/pytorch/pull/159272 are now passing
Image
  1. Check on your PR directly. When making a PR, esp one intended to fix unstable jobs, you can add the label "ciflow/unstable" and all jobs that are currently marked as unstable (including these ones) will automatically be run against your PR and you can see if they now pass or not.

ZainRizvi avatar Sep 02 '25 17:09 ZainRizvi

Reopening this issue so that you can:

  1. Validate that the jobs are passing and
  2. Move the jobs out of the unstable state once they do in fact pass.

My above comment explains how to do #1. #2 would be done by reverting this PR: https://github.com/pytorch/pytorch/pull/159272/files

ZainRizvi avatar Sep 02 '25 18:09 ZainRizvi

I believe the issue has been resolved in #9565. Based on recent runs, that seems to be true from the pytorch more recent test runs:

  • https://github.com/pytorch/pytorch/actions/runs/17406954070
  • https://github.com/pytorch/pytorch/actions/runs/17406351178

Note that the workflow runs linked here do not actually run the failing job, as it was removed from pull.yml in the PR I'm asking you to revert

ZainRizvi avatar Sep 02 '25 18:09 ZainRizvi