transformers ValueError: FalconMambaForCausalLM does not support Flash Attention 2.0 yet

ValueError: FalconMambaForCausalLM does not support Flash Attention 2.0 yet

Open Cshekar24 opened this issue 1 year ago • 3 comments

I'm facing issues while inferencing while using falcon LLM. The latency is around 20-30 minutes for a specific use case. I want to reduce the time and found that we can install Flash Attention 2 to significantly speedup inference. But I found that this is not supported in Falcon yet.

Request the team/members to look into the issue & address it ASAP.

Sep 18 '24 10:09 Cshekar24

transformers transformers copied to clipboard

ValueError: FalconMambaForCausalLM does not support Flash Attention 2.0 yet

transformers
transformers copied to clipboard