Deepak Singh

Results 11 comments of Deepak Singh

`out_fa_cute = fa_cute(q=q, k=k, v=v, cu_seqlens_q=cu_seqlens_q, cu_seqlens_k=cu_seqlens_k, max_seqlen_q = max_seqlen_k)` how are you able to use the param max_seqlen_q, I wasn't able to find it in the implementation for fa...

For using the above version, will I need to update the references in the WAN code where it was using FA2 to reference this library? Is my understanding correct. I...

yeah, the above cute version you share is FA4 right? I tried the commands you shared, but when I ran the WAN model, its not able to use and reference...

Yes, Thank you for the thoughts and instructions. So I updated the model to reference the cute impl (fa4) and since I was using the `flash_atnn_varlen_func`, moving from fa2 to...

> Can you clarify what isn't working properly? Also FA4 varlen does not yet support `max_seqlen_*`. `Can you clarify what isn't working properly` -> The response from the `flash_attn_varlen_func` for...

Any thoughts on the above @tridao ?

I'm only running the fwd pass. I tried to integrate FA4 [here](https://github.com/Wan-Video/Wan2.2/blob/e9783574ef77be11fcab9aa5607905402538c08d/wan/modules/attention.py#L113) in place of FA2 call. Which gave me different results. The only thing that I can see differs...

For more context, I verified the above with a script, calling both the FA2 and FA4 variant and comparing their outputs. And they doesn't match with 1e-4 tolerance. @tridao FA2...

Hi @XiaomingXu1995 , I'm also trying out `flash_attn_cute` on the same chip, but facing issues while setup as the library isn't able to file the installation path. Can you please...

> I just run `pip install --no-build-isolation -e .` in the `flash_attn/cute` directory on conda environment. > > And make sure there is no other version of flash_attn (e.g. flash_attn...