transformers
transformers copied to clipboard
ValueError: FalconMambaForCausalLM does not support Flash Attention 2.0 yet
I'm facing issues while inferencing while using falcon LLM. The latency is around 20-30 minutes for a specific use case. I want to reduce the time and found that we can install Flash Attention 2 to significantly speedup inference. But I found that this is not supported in Falcon yet.
Request the team/members to look into the issue & address it ASAP.