Ting Lu

Results 49 comments of Ting Lu

Should not disable torchrec since it will lead to failure in: [inductor / inductor-test-cuda13 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/165029#53804722630) ([gh](https://github.com/pytorch/pytorch/actions/runs/18855123492/job/53804722630)) torchrec_dlrm previous fbgemm failure is gone now.

Only failure in torchrec_dlrm, is due to fbgemm_gpu not found. Should disable it for CUDA 13 until fbgemm enables support for CUDA 13. ``` cuda eval torchrec_dlrm Traceback (most recent...

multiple failures after switching to gcc11 "ModuleNotFoundError: No module named 'tqdm'". https://hud.pytorch.org/pr/pytorch/pytorch/165029#54145300158 and "RuntimeError: torch.compile is not supported on Python 3.14+" in https://hud.pytorch.org/pr/pytorch/pytorch/165029#54153008524. Switching back to gcc9.

Running into outdated driver error at inductor-smoke-test-cuda13 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.aws.a100) https://github.com/pytorch/pytorch/actions/runs/18983177447/job/54229001993, Root cause is on AWS.A100 runner, driver installation is skipped if the runner is detected as...

Summary of pending failures: 1. Driver outdated on A100, need upgrade on AWS.A100 (Tried installation through https://github.com/pytorch/test-infra/pull/7433/files, no success yet) [inductor-periodic / inductor-smoke-test-cuda13 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.aws.a100)](https://hud.pytorch.org/pr/pytorch/pytorch/165029#54557420832) ([gh](https://github.com/pytorch/pytorch/actions/runs/19095297584/job/54557420832))...

Latest attempt to verify after Andrey upgraded the A100 runner driver - https://github.com/pytorch/pytorch/actions/runs/19095297584 https://github.com/pytorch/pytorch/actions/runs/19095297584/job/54831245402