flash-attention flash-attention imported, not running

flash-attention imported, not running

Open Jacck opened this issue 9 months ago • 2 comments

I get warning: You are not running the flash-attention implementation, expect numerical differences. I just run basic inference using model Microsoft Phi-3-mini-128k-instruct with cuda. I have Nvidia GeForce RTX 2080, Driver Version: 546.12, CUDA Version: 12.3. Bitsandbytes Version: 0.43.1. In addition, I get warning: Current flash-attenton does not support window_size. Either upgrade or use attn_implementation='eager'How to resolve it, Thx.

Apr 29 '24 19:04 Jacck

2080 (Turing) is not supported in the latest version.

Apr 29 '24 21:04 tridao

我在跑mini-internvl-4b预训练模型的时候也遇到了这样的问题：I get warning: You are not running the flash-attention implementation, expect numerical differences. A100服务器。torch version: 2.1.0a0+4136153，flash-attn version: 2.3.6，transformers version: 4.41.2

Jun 26 '24 03:06 tutuandyang

flash-attention flash-attention copied to clipboard

flash-attention imported, not running

flash-attention
flash-attention copied to clipboard