Andres Lugo
Andres Lugo
Implements forward automatic differentiation support for miopen_batch_norm as well as unskips the associated unit tests. Also fixes a class of functorch related unit tests that fail due to failing a...
Reverts the AMAX workaround now that hipblasLT supports AMAX. hipblasLT does not like getting nullptr for scale D so we create a dummy scalar tensor with the value of 1.0...
Pushing to our internal fork. Already merged upstream here: https://github.com/pytorch/pytorch/pull/123275
Hey @jithunnair-amd This PR is the change to fix the ldl_factor tests regarding that "hermitian" flag. I know we wanted to wait until hipsolver was enabled by default (hence the...
Porting recent ck gemm backend changes to ROCm
Fixes #ISSUE_NUMBER
Initial prototype for sdpa ck backend. Does not support odd number of attention heads
Fixes #ISSUE_NUMBER
Creating this so I can trivially see all sdpa ck tile changes in one place
Replaces https://github.com/ROCm/pytorch/pull/1592 Updated implementation of CK gemm backend. Can close previous PR This PR will generate CK kernels necessary for flash attention. Currently they will be generated in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/. The...