Ronghang Hu issues

Results 6 issues of


                                            Ronghang Hu

use XLA patched linear in FSDP (fix #3811 and #3718) and expose options on padding in all-gather and pinning memory

This PR applies a patch to `nn.Linear` (`torch.nn.functional.linear`) in XLA FSDP so that the `nn.Linear`'s backward pass will use its weight parameter rather than an intermediate result. This resolves the...

[RFC] A high-level GSPMD API in PT/XLA (based on `xs.mark_sharding`)

## 🚀 [RFC] A high-level GSPMD API in PT/XLA (based on `xs.mark_sharding`) This RFC proposes a high-level API for GSPMD through a wrapper class and a partitioning rule function, based...

enhancement

nostale

SPMD

Autograd discrepancy in `nn.Linear` (`torch.nn.functional.linear`) between native PyTorch and PyTorch/XLA

## 🐛 Bug There seems to be a discrepancy (in addition to https://github.com/pytorch/xla/issues/3718) in how `torch.nn.Linear` (`torch.nn.functional.linear`) is implemented and dispatched between the native PyTorch and PyTorch/XLA. In particular, **the...

bug

Ronghang Hu

use XLA patched linear in FSDP (fix #3811 and #3718) and expose options on padding in all-gather and pinning memory

[RFC] A high-level GSPMD API in PT/XLA (based on `xs.mark_sharding`)

Autograd discrepancy in `nn.Linear` (`torch.nn.functional.linear`) between native PyTorch and PyTorch/XLA

Mismatched rank in collective ops (all-gather, reduce-scatter, and all-to-all) in PJRT runtime

allowing `xm.get_ordinal()` and default device in `xm.xla_device()` in PJRT

XLA profiler issues: 1) TPU device trace does not show trace annotations, and 2) PRJT only captures 2 cores