pytorch
pytorch copied to clipboard
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Relates to https://ontrack-internal.amd.com/browse/SWDEV-461590
Porting recent ck gemm backend changes to ROCm
Copy of: https://github.com/ROCm/pytorch/pull/1457 Cherry pick of https://github.com/pytorch/pytorch/pull/130331/files Extending on the change in https://github.com/pytorch/pytorch/pull/127729 Depending on gcnArch the API to return socket power will change. This will help us handle both...
Fixes #ISSUE_NUMBER
### 🐛 Describe the bug ```bash In function ‘make_unique’, inlined from ‘allocate’ at /home/musclez/ComfyUI/opt/rocm/pytorch/torch/csrc/jit/runtime/static/impl.h:1129:47, inlined from ‘__ct ’ at /home/musclez/ComfyUI/opt/rocm/pytorch/torch/csrc/jit/runtime/static/impl.h:1114:41, inlined from ‘__ct_base ’ at /home/musclez/ComfyUI/opt/rocm/pytorch/torch/csrc/jit/runtime/static/impl.cpp:2260:7: /usr/local/include/c++/13.3.1/bits/unique_ptr.h:1085:30: warning: argument 1...
`_c10d_functional_autograd::all_to_all_single` seems not implemented on ROCm. Note: another unfixed problem is the mismatching of outputs between torch.ops.aten._scaled_dot_product_flash_attention and _scaled_dot_product_chunk_flash_attention. We need fix both problems to enable this UT. Fixes SWDEV-459618
Fixes SWDEV-487907. Verified throwing exception for distributed is working correctly on single gpu with command: python .automation_scripts/run_pytorch_unit_tests.py --priority_test
With the latest IFU into rocm6.3_internal_testing branch, we pulled in SymmetricMemory code which is being used by intra node communication also. SymmetricMemory also introduces a new memory allocator called CUDASymmetricMemoryAllocator....
**This PR is created for debugging. It'll be closed in future**