flash-attention
flash-attention copied to clipboard
flash-attention imported, not running
I get warning: You are not running the flash-attention implementation, expect numerical differences. I just run basic inference using model Microsoft Phi-3-mini-128k-instruct with cuda. I have Nvidia GeForce RTX 2080, Driver Version: 546.12, CUDA Version: 12.3. Bitsandbytes Version: 0.43.1. In addition, I get warning: Current flash-attenton
does not support window_size
. Either upgrade or use attn_implementation='eager'
How to resolve it, Thx.
2080 (Turing) is not supported in the latest version.
我在跑mini-internvl-4b预训练模型的时候也遇到了这样的问题:I get warning: You are not running the flash-attention implementation, expect numerical differences. A100服务器。torch version: 2.1.0a0+4136153,flash-attn version: 2.3.6,transformers version: 4.41.2