2.4 backport PR request list
The issue is to track 2.4 release backport.
For any PRs you want to backport to 2.4, please reply with following:
Original PR link Reason to backport 2.4 backport PR link
-
original PR: https://github.com/pytorch/xla/pull/7157
-
Reason to backport: The PR is related to a planned release feature urgent fix: urgent fix for while_loop in 2.4 release
-
2.4 backport PR link: https://github.com/pytorch/xla/pull/7306
- Original PR: https://github.com/pytorch/xla/pull/7155
- Reason to backport: Fix a bug in distributed checkpointing with padded tensors
- 2.4 backport PR link: https://github.com/pytorch/xla/pull/7243
- original PR: #7236
- Reason to backport: Include build dependencies for CUDA 12.4
- 2.4 backport PR link: #7244
- original PR: #7219
- Reason to backport: Minor addition to Triton functionality to support CUDA plugins
- 2.4 backport PR link: #7303
- original PR: https://github.com/pytorch/xla/pull/7254 https://github.com/pytorch/xla/pull/7258
- Reason to backport: OpenXLA pin update requested because new telemetry in PyTorch/xla needs to be enabled, requested by Aman
- 2.4 backport PR link: https://github.com/pytorch/xla/pull/7261 https://github.com/pytorch/xla/pull/7262
- Original PRs: #7249 and #7268
- Reason to backport: Enables new libtpu telemetry and detection of CUDA plugin package (
torch_xla_cuda_plugin) by default. - Backport link: #7270
- Original PR: Enable bucketized all-reduce for gradients #7216
- Reason to backport: Parity with Neuron branch r2.1_aws_neuron
- Backport link: WIP
-
original PR: https://github.com/pytorch/xla/pull/7278
-
backport PR: https://github.com/pytorch/xla/pull/7284
Reason: This is a regression. Without this fix, PyTorch/XLA will always output a log spam about PJRT version upon loading.
Risk: Low.
- origional pr: https://github.com/pytorch/xla/pull/7271
- backport pr: https://github.com/pytorch/xla/pull/7285
Reason: This is a regression fix. Without this fix a bunch of xlml test will fail
Risk: Low since it is just a revert.
- original pr: https://github.com/pytorch/xla/pull/7301
- backport pr: https://github.com/pytorch/xla/pull/7310
Reason:
It's a cherry-pick hot fix for AttributeError: module 'numpy' has no attribute 'product' in r2.4 CI
Risk: Low since master issue has beem fixed by the original pr
original prs
- https://github.com/pytorch/xla/pull/7234
- https://github.com/pytorch/xla/pull/7246
- https://github.com/pytorch/xla/pull/7256
- https://github.com/pytorch/xla/pull/7322
- https://github.com/pytorch/xla/pull/7327
- https://github.com/pytorch/xla/pull/7328
- https://github.com/pytorch/xla/pull/7341
- https://github.com/pytorch/xla/pull/7631
backport prs
- https://github.com/pytorch/xla/pull/7611
- https://github.com/pytorch/xla/pull/7643
- https://github.com/pytorch/xla/pull/7649
- https://github.com/pytorch/xla/pull/7666
- https://github.com/pytorch/xla/pull/7668
- https://github.com/pytorch/xla/pull/7669
- https://github.com/pytorch/xla/pull/7673
Reason: I want to enable the eager mode for 2.4 release since I want to make it default for the 2.6
Risk
low, all of the features are guarded by torch_xla.experimental.eager_mdoe, these prs should be no op is eager mode is disabled.
Original PR
- #7617
- #7329
Backport PR
- #7616
- #7618
Reason: To fix CI
Risk: Low, since this doesn't change anything in torch_xla library.
Original PR: #7640
Backport PR: #7684
Reason: To fix upstream pytorch build
Risk: Low, this doesn't change anything in torch_xla library
Original: https://github.com/pytorch/xla/pull/7231
Backport: https://github.com/pytorch/xla/pull/7708
Support MoE.
Original PR
- https://github.com/pytorch/xla/pull/7700
- https://github.com/pytorch/xla/pull/7710
Backport PR:
- https://github.com/pytorch/xla/pull/7717
Reason: Back port docs for eager mode
Risk: low, just a doc