pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Results 133 pytorch issues
Sort by recently updated
recently updated
newest added

Relates to https://ontrack-internal.amd.com/browse/SWDEV-461590

Porting recent ck gemm backend changes to ROCm

Copy of: https://github.com/ROCm/pytorch/pull/1457 Cherry pick of https://github.com/pytorch/pytorch/pull/130331/files Extending on the change in https://github.com/pytorch/pytorch/pull/127729 Depending on gcnArch the API to return socket power will change. This will help us handle both...

Fixes #ISSUE_NUMBER

### 🐛 Describe the bug ```bash In function ‘make_unique’, inlined from ‘allocate’ at /home/musclez/ComfyUI/opt/rocm/pytorch/torch/csrc/jit/runtime/static/impl.h:1129:47, inlined from ‘__ct ’ at /home/musclez/ComfyUI/opt/rocm/pytorch/torch/csrc/jit/runtime/static/impl.h:1114:41, inlined from ‘__ct_base ’ at /home/musclez/ComfyUI/opt/rocm/pytorch/torch/csrc/jit/runtime/static/impl.cpp:2260:7: /usr/local/include/c++/13.3.1/bits/unique_ptr.h:1085:30: warning: argument 1...

`_c10d_functional_autograd::all_to_all_single` seems not implemented on ROCm. Note: another unfixed problem is the mismatching of outputs between torch.ops.aten._scaled_dot_product_flash_attention and _scaled_dot_product_chunk_flash_attention. We need fix both problems to enable this UT. Fixes SWDEV-459618

Fixes SWDEV-487907. Verified throwing exception for distributed is working correctly on single gpu with command: python .automation_scripts/run_pytorch_unit_tests.py --priority_test

With the latest IFU into rocm6.3_internal_testing branch, we pulled in SymmetricMemory code which is being used by intra node communication also. SymmetricMemory also introduces a new memory allocator called CUDASymmetricMemoryAllocator....

**This PR is created for debugging. It'll be closed in future**