Jiewen Tan

Results 9 issues of Jiewen Tan

Summary: To trace through c10d::all_gather, AOT needs to support TensorList in-place ops. Companion PyTorch PR: [pytorch#77940](https://github.com/pytorch/pytorch/pull/77806). Test Plan: WIP.

cla signed

Unsolicited dialogs or alerts are often disruptive and hated by users. The Level 1 spec didn’t require and foresee that disruptive UI would be shown in response to makeCredential or...

type:technical
stat:breaking

Author: @kumpera Summary: A companion change to pytorch/pytorch#84224. Test Plan: CI.

It takes forever to run any tests on XLA GPU. And suspicious messages are shown: ``` (pytorch) jwtan@jwtan-v100-4:~/work/pytorch/xla$ MASTER_ADDR=localhost MASTER_PORT=6000 LD_LIBRARY_PATH=/opt/conda/lib/ python test/test_ddp.py TestXrtDistributedDataParallel.test_ddp_correctness Running tests under Python 3.10.6: /opt/conda/envs/pytorch/bin/python3...

triaged
xla:gpu

Summary: This commit adds a test case to test a larger model that can trigger multiple all_reduces instead of one. Test Plan: XRT: MASTER_ADDR=localhost MASTER_PORT=6000 python test/test_ddp.py TestXrtDistributedDataParallel.test_ddp_correctness_large_net PJRT: PJRT_DEVICE=TPU...

triaged

xla/test/test_ddp.py is flaky in GPU. Investigate and reenable it.

triaged
ddp

### 🐛 Describe the bug Here is the PoC: ``` import torch import functorch # Reduced from test_torch.py: test_exponential def poc7(): device = 'cpu' test = (-0, float('inf')) t =...

Summary: This pull request introduces make_kernel_from_pallas API which is the top level API to interact with the Pallas integration. It takes a pallas_call wrapper and than make it a custom...

backport_2.3

Summary: Adds an early exit for clip_grad_norm_. Test Plan: PJRT_DEVICE=TPU python test/test_operations.py -v -k test_clip_grad_norm_