Jeff Daily
Jeff Daily
@andrewor14 this link should take you to the trunk commit summary for when the PR was landed. You'll see 3 rocm failures. https://hud.pytorch.org/pytorch/pytorch/commit/5680f565d5b7d4aa412a3988d3d91ca4c5679303
@xinyazhang and/or @groenenboomj can you assist with the ROCm failures?
@pytorchbot revert
@pytorchbot revert -m "broke ROCm CI while ROCm was in unstable status" -c ignoredsignal
@xw285cornell would appreciate your review of this. I'm assuming this PR will break your internal build?
@trixirt I am in favor of this PR. My apologies for adding yet more exposure to hipblaslt APIs that you need to work around again. Please resolve conflicts likely due...
> There's a couple failing tests. Hud indicates errors are related https://hud.pytorch.org/pr/125083 Apologies, I should have moved it back to draft after my first commit once I realized this wasn't...
> Thanks, Jeff. Nit: Maybe a class level comment which says that the str format is expected to match the format used by nvidia-smi and can be counted on for...
@pytorchbot merge
> @pytorchbot revert -m "test_uuid is flaky? ex https://github.com/pytorch/pytorch/actions/runs/8988855916/job/24692369523 https://hud.pytorch.org/flakytest?name=test_uuid&suite=TestCuda&file=%25&limit=300" -c nosignal So sorry, I thought all was well. I will take another crack at this. Thanks for the link...