LoRA + FlashAttention2 speed up？

Open zhoumengbo opened this issue 2 years ago • 1 comments

When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?

Nov 11 '23 07:11 zhoumengbo

Thanks for your interest in LMFlow! Theoretically I think it helps, since flash attention improves the cache-friendliness for attention operations and should also help the forward of the freezed model for lora. However, we haven't done empirical tests on this matter, which is indeed an interesting topic 😄

Nov 15 '23 17:11 research4pan