pytorch
pytorch copied to clipboard
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Environment variable PYTORCH_MIOPEN_SUGGEST_NHWC=1 enables MIOpen batchnorm for NHWC
Reverts the AMAX workaround now that hipblasLT supports AMAX. hipblasLT does not like getting nullptr for scale D so we create a dummy scalar tensor with the value of 1.0...
### 🐛 Describe the bug Hi, I profiled the generation of text with the Mistral 7b LLM on my MI100 GPU and saw that some gemv fp16 kernels don't seem...
### 🐛 Describe the bug Hi, When doing text generation with Mistral 7b with Hugginface transformers on a MI100 GPU, I can see in the collected torch trace that a...
### 🚀 The feature, motivation and pitch Enable support for Flash Attention Memory Efficient and SDPA kernels for AMD GPUs. At present using these gives below warning with latest nightlies...
### 🐛 Describe the bug quick start example from fastai course no longer runs. I get the following error when running [quickstart.py ](https://gist.github.com/briansp2020/92c56be60caeb2e497eaa275b85e13fb). It used to run fine when I...
add std and cub::BLOCK_LOAD_WARP_TRANSPOSE Jira ID: [SWDEV-458189](https://ontrack-internal.amd.com/browse/SWDEV-458189) Fixes #ISSUE_NUMBER
Pushing to our internal fork. Already merged upstream here: https://github.com/pytorch/pytorch/pull/123275
They were disabled in AOTriton V1, but V2 should fix most of them. Passed with ``` PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_meta.py -k flash_attention -v PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_ops.py -k flash_attention -v...