Antoni Viros
Antoni Viros
### 🐛 Describe the bug While testing torch.compile() for a HF T5-style model (with some architectural changes), I have seen accuracy errors. When trying to run the accuracy minifier to...
### System Info transformers main branch ### Who can help? @sgugger ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [...
### 🐛 Describe the bug For models that use the dynamic shape of one of its tensors as part of an indexing op, torch.export complains it can't deal with it....
This PR adds basic support for torch.compile() for the non-varlen variants of Flash Attention. This essentially allows for models that use the `flash_attn_qkvpacked_func`, `flash_attn_kvpacked_func`, and `flash_attn_func` to compile without graph...