zzlol63

Results 2 issues of zzlol63

On Windows, PyTorch does not come pre-compiled with support for FlashAttention in the `torch.nn.functional.scaled_dot_product_attention` method, whereas on Linux it does, meaning there is a performance gap between the two. This...

### Describe your use-case. The latest version of Diffusers supports being able to configure or select a specific attention backend such as FlashAttention-2/FlashAttention-3 (which supports backward pass). OneTrainer could potentially...

enhancement