pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Results 133 pytorch issues
Sort by recently updated
recently updated
newest added

Environment variable PYTORCH_MIOPEN_SUGGEST_NHWC=1 enables MIOpen batchnorm for NHWC

Reverts the AMAX workaround now that hipblasLT supports AMAX. hipblasLT does not like getting nullptr for scale D so we create a dummy scalar tensor with the value of 1.0...

### 🐛 Describe the bug Hi, I profiled the generation of text with the Mistral 7b LLM on my MI100 GPU and saw that some gemv fp16 kernels don't seem...

### 🐛 Describe the bug Hi, When doing text generation with Mistral 7b with Hugginface transformers on a MI100 GPU, I can see in the collected torch trace that a...

### 🚀 The feature, motivation and pitch Enable support for Flash Attention Memory Efficient and SDPA kernels for AMD GPUs. At present using these gives below warning with latest nightlies...

### 🐛 Describe the bug quick start example from fastai course no longer runs. I get the following error when running [quickstart.py ](https://gist.github.com/briansp2020/92c56be60caeb2e497eaa275b85e13fb). It used to run fine when I...

add std and cub::BLOCK_LOAD_WARP_TRANSPOSE Jira ID: [SWDEV-458189](https://ontrack-internal.amd.com/browse/SWDEV-458189) Fixes #ISSUE_NUMBER

Pushing to our internal fork. Already merged upstream here: https://github.com/pytorch/pytorch/pull/123275

They were disabled in AOTriton V1, but V2 should fix most of them. Passed with ``` PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_meta.py -k flash_attention -v PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_ops.py -k flash_attention -v...