Huy Do

Results 161 comments of Huy Do

> @huydhn can we just forward fix and skip the test in ROCm? Yes, plz go ahead with the fix, I can stamp it if you need

@jbschlosser I have also just noticed another periodic failure coming out of this PR https://hud.pytorch.org/pytorch/pytorch/commit/2a41fc03903de63270d325bd1886a50faf32d7e4#26340619959. It's a CUDA memory leak failure (we only run memory leak check periodically) and your...

@sanketpurandare I'm seeing the new test `test_tracker_multi_group_eager` failing on ROCm distributed job https://hud.pytorch.org/pytorch/pytorch/commit/287c68c5eca2e15bf73b84fe9e39755ae3f842ba#26578545778. Could you help take a look? The job is only run periodically, do its signal was missed...

Btw, I disable the test in https://github.com/pytorch/pytorch/issues/129390 to keep trunk sane. In your fixed PR, please add "Fixes https://github.com/pytorch/pytorch/issues/129390" in your PR description to run the test in your PR

@pytorchbot revert -m 'Sorry for reverting your change, but there are real failures on the PR that sneak in during the log classifier outage' -c weird

@pytorchbot rebase

Answer to the capacity question https://github.com/pytorch/pytorch/pull/125399#issuecomment-2345746062

@AlekseiNikiforovIBM I have sent out an invite to you with write permission to the repo so that you will have the permission to run CI on your end without our...