flash-linear-attention icon indicating copy to clipboard operation
flash-linear-attention copied to clipboard

Hello from HF Diffusers

Open sayakpaul opened this issue 1 year ago • 3 comments

Thanks for the incredibly clean repository!

I am Sayak from the Diffusers team at Hugging Face. My question is probably very naive, so I apologize for that in advance.

I wanted to know if linear attention could applied in inference time only? More precisely, can I take a model trained with regular attention and turn it into a linear attention model during inference?

sayakpaul avatar Aug 15 '24 08:08 sayakpaul

Hello Sayak,

Thanks for your interest! Unfortunately, we cannot directly convert a model with softmax attention to one with linear attention during inference without any additional training. However, it is indeed possible to finetune pretrained LLMs for a few steps—much fewer than training from scratch—to switch from regular attention to linear attention. You can refer to these resources for more details: arXiv:2405.06640, OpenReview, etc.

sustcsonglin avatar Aug 15 '24 08:08 sustcsonglin

@sayakpaul FYI, we release some weights converted from Mistral-7B-v0.1 as in arXiv:2405.06640. You can have a try by loading fla-hub/gla-7B-mistral-20B, fla-hub/gsa-7B-mistral-20B or fla-hub/gsa-7B-mistral-100B

yzhangcs avatar Aug 15 '24 09:08 yzhangcs

Would there be any interest to try something similar for diffusion models? I am happy to team up. I believe this will be a very significant contribution to the community now that large models like Flux are coming up.

sayakpaul avatar Aug 15 '24 09:08 sayakpaul

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Sep 15 '24 00:09 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Sep 22 '24 00:09 github-actions[bot]