Question about Flux attention implementation

Open CHR-ray opened this issue 5 months ago • 1 comments

Hi,

When I read the code at https://github.com/kohya-ss/sd-scripts/blob/5dff02a65da38c411ee679821504ce947d2abd7d/library/sd3_models.py#L545-L565

I see 3 different optimization approach for attention block in sd3.

But for Flux model, which also have similar MMDiT block, I see attention implementation is https://github.com/kohya-ss/sd-scripts/blob/5dff02a65da38c411ee679821504ce947d2abd7d/library/flux_models.py#L449-L455

which only contains SDPA approach, no xformers approach.

I search for keywords like "xformers" and "flux", but it seems like no one talk about this difference.

So, can I ask the reason behind it? In my opinion, same structure can benefit from same optimization approach. If it is possible to add xformers for flux attention?

Aug 01 '25 02:08 CHR-ray

Flux reference implementation did not come with xformers so it is not there. No specific reason I believe. It could be added.

Aug 01 '25 23:08 rockerBOO