Deepak Singh comments

Results 11 comments of


                                            Deepak Singh

Issue on Flash Attention Cute Implementation on B200

`out_fa_cute = fa_cute(q=q, k=k, v=v, cu_seqlens_q=cu_seqlens_q, cu_seqlens_k=cu_seqlens_k, max_seqlen_q = max_seqlen_k)` how are you able to use the param max_seqlen_q, I wasn't able to find it in the implementation for fa...

FA3 support for GB200

For using the above version, will I need to update the references in the WAN code where it was using FA2 to reference this library? Is my understanding correct. I...

FA3 support for GB200

yeah, the above cute version you share is FA4 right? I tried the commands you shared, but when I ran the WAN model, its not able to use and reference...

FA3 support for GB200

Yes, Thank you for the thoughts and instructions. So I updated the model to reference the cute impl (fa4) and since I was using the `flash_atnn_varlen_func`, moving from fa2 to...

FA3 support for GB200

> Can you clarify what isn't working properly? Also FA4 varlen does not yet support `max_seqlen_*`. `Can you clarify what isn't working properly` -> The response from the `flash_attn_varlen_func` for...

FA3 support for GB200

Any thoughts on the above @tridao ?

FA3 support for GB200

I'm only running the fwd pass. I tried to integrate FA4 [here](https://github.com/Wan-Video/Wan2.2/blob/e9783574ef77be11fcab9aa5607905402538c08d/wan/modules/attention.py#L113) in place of FA2 call. Which gave me different results. The only thing that I can see differs...

Deepak Singh

Issue on Flash Attention Cute Implementation on B200

FA3 support for GB200

FA3 support for GB200

FA3 support for GB200

FA3 support for GB200

FA3 support for GB200

FA3 support for GB200

FA3 support for GB200

[cute, bwd, sm100] support for head_dim = 64

[cute, bwd, sm100] support for head_dim = 64