Rui

Results 7 issues of Rui

In this PR, we adapt to account for a new initialization path that supports multi-node SPMD in Neuron. In order to minimize this change, we retain the `xla.init()` API, but...

Simple CR to avoid a segmentation fault when there are placeholder tensors involved, as we attempt to de-reference the device from the buffer. It fixes the seg fault for https://github.com/pytorch/xla/issues/9049....

## 🐛 Bug Test: ``` def test_sharded_matmul(tensor_a_shape, tensor_b_shape, mesh_shape, sharding_spec_a, sharding_spec_b): cpu_device = torch.device("cpu") neuron_device = xm.xla_device() device_ids = np.array(range(NUM_DEVICES)) mesh = Mesh(device_ids, mesh_shape, ("tp1", "tp2")) tensor_a_cpu = torch.rand(tensor_a_shape, dtype=torch.float32,...

bug
torchxla2

## 🚀 Feature We propose an accelerator-agnostic, hybrid Single-Program Multiple-Data (SPMD)/Multiple-Program Multiple-Data (MPMD) Pipeline Parallelism implementation in PyTorch XLA. The key objectives are: * Enable efficient model-parallel training for large...

enhancement
distributed
RFC

This PR is an extension to the placeholder feature https://github.com/pytorch/xla/issues/8612 that extends the functionality to accommodate sharded tensors for SPMD. It simultaneously fixes a typo in the existing binding for...

We have only created docker images for Python 3.10 since PyTorch/XLA 2.1. This has limited how we seamlessly debug and test out changes for any given Python version, particularly for...

enhancement
install