Mayank Mishra

Results 187 comments of Mayank Mishra

aah, yikes @ani300 I had started working on the same thing https://github.com/Dao-AILab/flash-attention/pull/1145 😓 Ill let you handle this 😃

@GLivshits I dont think it can be handled in older versions of torch

@tridao @ani300 is there any progress/updates on this? Its a pretty neat feature to have Flash Attention fully end-to-end traceable natively.

Hey, this is expected behaviour. FSDP-1 only allows accumulation in 16-bit precision. This is not the case for FSDP-2 which allows accumulation in both 16-bit and 32-bit.

documentation for FSDP-1: documentation for FSDP-2:

Aah here we go. Is flash attention merged into the original repo? I saw Tri Dao had opened a PR