zzlol63 comments

Results 14 comments of


                                            zzlol63

[Feat]: Attention backend selection for Diffusers

As discussed on Discord, the issue appears specific to Windows where Torch SDP backend cannot use the native FlashAttention-2 based kernel as it's not compiled with FlashAttention support in the...

[Feat]: Attention backend selection for Diffusers

Did some further testing, this time with a fresh dataset (with optional masks) and testing with a natively booted Linux distro with identical settings in OneTrainer and came back with...

[Feat]: Attention backend selection for Diffusers

I ran a set of tests using FLUX.1-dev on the same dataset. I did post some numbers previously but realised I made a huge mistake where the FlexAttention backend wasn't...

[Feat]: Attention backend selection for Diffusers

> So if on windows, the torch SDP algorithm is much worse, the only alternative would be to use another external flash attention algorithm. For example by using flash_attn (with...