Question about Flux attention implementation
Hi,
When I read the code at https://github.com/kohya-ss/sd-scripts/blob/5dff02a65da38c411ee679821504ce947d2abd7d/library/sd3_models.py#L545-L565
I see 3 different optimization approach for attention block in sd3.
But for Flux model, which also have similar MMDiT block, I see attention implementation is https://github.com/kohya-ss/sd-scripts/blob/5dff02a65da38c411ee679821504ce947d2abd7d/library/flux_models.py#L449-L455
which only contains SDPA approach, no xformers approach.
I search for keywords like "xformers" and "flux", but it seems like no one talk about this difference.
So, can I ask the reason behind it? In my opinion, same structure can benefit from same optimization approach. If it is possible to add xformers for flux attention?
Flux reference implementation did not come with xformers so it is not there. No specific reason I believe. It could be added.