text-generation-webui UserWarning: 1Torch was not compiled with flash attention.

Describe the bug

When i load my model and try to user it get a error

13:41:11-717356 INFO Saved "I:\programming\text-generation-webui\presets\My Preset.yaml". I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = torch.nn.functional.scaled_dot_product_attention(

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Load model, https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5

Try to use it.

Get error

Screenshot

Logs

13:40:23-395560 INFO     Loaded the model in 77.23 seconds.
13:41:11-717356 INFO     Saved "I:\programming\text-generation-webui\presets\My Preset.yaml".
I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

System Info

nvidia rtx 2080ti 
Windows 11 Home Version	10.0.22631 Build 22631
System Model	X570 AORUS PRO WIFI
System Type	x64-based PC
Processor	AMD Ryzen 7 5800X 8-Core Processor, 3801 Mhz, 8 Core(s), 16 Logical Processor(s)

Mar 15 '24 17:03 capactiyvirus

did u forget to put the python install commands for pip ? you just have print statements. The main reason im asking is cuz i checked if pytorch is installed and it didnt seem so so im a bit confuse. Maybe i have a scuffed python env

Mar 15 '24 18:03 capactiyvirus

after going to https://pytorch.org/ and running their installer

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

I found my python had torch installed

Mar 15 '24 18:03 capactiyvirus

well updating pytorch gave me more errors lol

14:19:03-815672 ERROR    Could not find the character "[]" inside characters/. No
                         character has been loaded.
Traceback (most recent call last):
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "I:\programming\text-generation-webui\modules\chat.py", line 664, in load_character
    raise ValueError
ValueError
I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

Mar 15 '24 18:03 capactiyvirus

Same problem here. Seems like a regression

Mar 15 '24 21:03 oldgithubman

im not to familiar with machine learning in general so i cant really be much help with that :(. Do u know if we might need some kind of different packages ?

Mar 18 '24 17:03 capactiyvirus

It used to work. Some dev broke something. We need them to fix it. As you've already discovered, trying to fix it yourself just breaks more stuff

Mar 19 '24 01:03 oldgithubman

I'm seeing this warning too. Model seems to run despite it.

Mar 19 '24 11:03 hugs7

Seems slow to me though. You?

Mar 19 '24 17:03 oldgithubman

I was using llama-2-7b-chat-hf for a project on my RTX 4050 and I get the same warning. The response also takes 1 hour to generate.

Mar 19 '24 23:03 VishalV1807

An hour seems far too long for a response. Are you using a pipeline to evaluate?

Mar 20 '24 10:03 hugs7

Well, not having flash attention makes a big difference, especially in memory-constrained scenarios. People need to stop rushing releases. I've already switched to ollama and will evaluate llm studio today probably

Mar 20 '24 19:03 oldgithubman

^ ty @oldmanjk

Mar 20 '24 21:03 capactiyvirus

If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA at all), see https://github.com/pytorch/pytorch/issues/108175

Mar 20 '24 22:03 tildebyte

If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA at all), see https://github.com/pytorch/pytorch/issues/108175

I'm on linux stable. No flash attention

Mar 20 '24 22:03 oldgithubman

Same warning for Llama-2-13b-chat-hf.

D:\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:670: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Output generated in 254.98 seconds (0.70 tokens/s, 178 tokens, context 78, seed 2082798633)

Apr 16 '24 16:04 aphex3k

I have the same problem with Qwen 1.5 on Windows. I found that regardless of whether or not flash-attn is installed with the corresponding version of pytorch, I don't have this problem when using torch=2.1.

When using torch=2.2, LLM inference gives the following warning:

D:\Project\AIGC\temp\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py:693: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Output generated in 29.86 seconds (0.87 tokens/s, 26 tokens, context 59, seed 1812789762)

After installing torch version 2.1, the problem disappeared:

conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

Inference speed:

Output generated in 29.40 seconds (15.00 tokens/s, 441 tokens, context 113, seed 601091263)

Apr 26 '24 03:04 ADAning

If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.

Apr 26 '24 06:04 wertstahl

If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.

The error is because ComfyUI just updated the dependencies past 2.1.2+cu121 a couple months ago basically without seeming to take into account that it would be guaranteed to cause this error in all cases on Windows, since Flash Attentiom was present in that version and lower but simply isn't anymore for unclear reasons. It's not "a lot of reading". You just have to manually reinstall specifically 2.1.2+cu121, which is the last version where Flash Attention existed in any way on Windows.

Apr 28 '24 17:04 Akira13641

If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.

The error is because ComfyUI just updated the dependencies past 2.1.2+cu121 a couple months ago basically without seeming to take into account that it would be guaranteed to cause this error in all cases on Windows, since Flash Attentiom was present in that version and lower but simply isn't anymore for unclear reasons. It's not "a lot of reading". You just have to manually reinstall specifically 2.1.2+cu121, which is the last version where Flash Attention existed in any way on Windows.

Do you have the commands to do that? Im not sure what the torchvision version should be in that case?

May 09 '24 17:05 Urammar

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121 But then XFormers will complain because the latest version requires Pytorch 2.3.0... I am not sure if I want to go find what version xformers works with pytorch 2.1.2...

Might try to find how to get a flash added to the Pytorch 2.3

May 16 '24 16:05 PZAragon

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121 But then XFormers will complain because the latest version requires Pytorch 2.3.0... I am not sure if I want to go find what version xformers works with pytorch 2.1.2...

Might try to find how to get a flash added to the Pytorch 2.3

Thanks for the help! for the xformers version matters i downgraded into 0.0.22.post4 pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121

Jul 07 '24 18:07 martoonz

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

for the xformers version matters i downgraded into 0.0.22.post4 pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121

Sorry to ask a newbie question, but could you please kindly tell me on which folders exactly in ComfyUI you performed those install commands?

Aug 11 '24 07:08 Tyomanator

The command is: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

for the xformers version matters i downgraded into 0.0.22.post4 pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121

Sorry to ask a newbie question, but could you please kindly tell me on which folders exactly in ComfyUI you performed those install commands?

run cmd in your, example : "X:\ComfyUI_windows_portable\python_embeded" folder then paste the commands inside the command prompt window.

In addition in cmd you can check package dependencies version for your personal packages version. to the pip show [package name] command, there is pipdeptree. Just do $ pip install pipdeptree then run $ pipdeptree

Aug 16 '24 00:08 martoonz

text-generation-webui text-generation-webui copied to clipboard

UserWarning: 1Torch was not compiled with flash attention.

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard