Huy Do
Huy Do
> @huydhn can we just forward fix and skip the test in ROCm? Yes, plz go ahead with the fix, I can stamp it if you need
@jbschlosser I have also just noticed another periodic failure coming out of this PR https://hud.pytorch.org/pytorch/pytorch/commit/2a41fc03903de63270d325bd1886a50faf32d7e4#26340619959. It's a CUDA memory leak failure (we only run memory leak check periodically) and your...
@sanketpurandare I'm seeing the new test `test_tracker_multi_group_eager` failing on ROCm distributed job https://hud.pytorch.org/pytorch/pytorch/commit/287c68c5eca2e15bf73b84fe9e39755ae3f842ba#26578545778. Could you help take a look? The job is only run periodically, do its signal was missed...
Btw, I disable the test in https://github.com/pytorch/pytorch/issues/129390 to keep trunk sane. In your fixed PR, please add "Fixes https://github.com/pytorch/pytorch/issues/129390" in your PR description to run the test in your PR
@pytorchbot drci
@pytorchbot revert -m 'Sorry for reverting your change, but there are real failures on the PR that sneak in during the log classifier outage' -c weird
@pytorchbot rebase
Answer to the capacity question https://github.com/pytorch/pytorch/pull/125399#issuecomment-2345746062
@AlekseiNikiforovIBM I have sent out an invite to you with write permission to the repo so that you will have the permission to run CI on your end without our...
@pytorchbot merge