LMFlow
LMFlow copied to clipboard
LoRA + FlashAttention2 speed up?
When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?
Thanks for your interest in LMFlow! Theoretically I think it helps, since flash attention improves the cache-friendliness for attention operations and should also help the forward of the freezed model for lora. However, we haven't done empirical tests on this matter, which is indeed an interesting topic 😄