Mayank Mishra
Mayank Mishra
aah, yikes @ani300 I had started working on the same thing https://github.com/Dao-AILab/flash-attention/pull/1145 😓 Ill let you handle this 😃
@GLivshits I dont think it can be handled in older versions of torch
@tridao @ani300 is there any progress/updates on this? Its a pretty neat feature to have Flash Attention fully end-to-end traceable natively.
Hey, this is expected behaviour. FSDP-1 only allows accumulation in 16-bit precision. This is not the case for FSDP-2 which allows accumulation in both 16-bit and 32-bit.
documentation for FSDP-1: documentation for FSDP-2:
this project is not really maintained anymore, I suggest other alternatives
Aah here we go. Is flash attention merged into the original repo? I saw Tri Dao had opened a PR