text-generation-webui
text-generation-webui copied to clipboard
UserWarning: 1Torch was not compiled with flash attention.
Describe the bug
When i load my model and try to user it get a error
13:41:11-717356 INFO Saved "I:\programming\text-generation-webui\presets\My Preset.yaml". I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = torch.nn.functional.scaled_dot_product_attention(
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Load model, https://huggingface.co/WhiteRabbitNeo/WhiteRabbitNeo-33B-v1.5
Try to use it.
Get error
Screenshot
Logs
13:40:23-395560 INFO Loaded the model in 77.23 seconds.
13:41:11-717356 INFO Saved "I:\programming\text-generation-webui\presets\My Preset.yaml".
I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
System Info
nvidia rtx 2080ti
Windows 11 Home Version 10.0.22631 Build 22631
System Model X570 AORUS PRO WIFI
System Type x64-based PC
Processor AMD Ryzen 7 5800X 8-Core Processor, 3801 Mhz, 8 Core(s), 16 Logical Processor(s)
did u forget to put the python install commands for pip ? you just have print statements. The main reason im asking is cuz i checked if pytorch is installed and it didnt seem so so im a bit confuse. Maybe i have a scuffed python env
after going to https://pytorch.org/ and running their installer
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
I found my python had torch installed
well updating pytorch gave me more errors lol
14:19:03-815672 ERROR Could not find the character "[]" inside characters/. No
character has been loaded.
Traceback (most recent call last):
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1550, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 661, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "I:\programming\text-generation-webui\modules\chat.py", line 664, in load_character
raise ValueError
ValueError
I:\programming\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:671: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Same problem here. Seems like a regression
im not to familiar with machine learning in general so i cant really be much help with that :(. Do u know if we might need some kind of different packages ?
It used to work. Some dev broke something. We need them to fix it. As you've already discovered, trying to fix it yourself just breaks more stuff
I'm seeing this warning too. Model seems to run despite it.
Seems slow to me though. You?
I was using llama-2-7b-chat-hf for a project on my RTX 4050 and I get the same warning. The response also takes 1 hour to generate.
An hour seems far too long for a response. Are you using a pipeline to evaluate?
Well, not having flash attention makes a big difference, especially in memory-constrained scenarios. People need to stop rushing releases. I've already switched to ollama and will evaluate llm studio today probably
^ ty @oldmanjk
If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA at all), see https://github.com/pytorch/pytorch/issues/108175
If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA at all), see https://github.com/pytorch/pytorch/issues/108175
I'm on linux stable. No flash attention
Same warning for Llama-2-13b-chat-hf
.
D:\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:670: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Output generated in 254.98 seconds (0.70 tokens/s, 178 tokens, context 78, seed 2082798633)
I have the same problem with Qwen 1.5 on Windows. I found that regardless of whether or not flash-attn
is installed with the corresponding version of pytorch, I don't have this problem when using torch=2.1
.
When using torch=2.2
, LLM inference gives the following warning:
D:\Project\AIGC\temp\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\qwen2\modeling_qwen2.py:693: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Output generated in 29.86 seconds (0.87 tokens/s, 26 tokens, context 59, seed 1812789762)
After installing torch
version 2.1, the problem disappeared:
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
Inference speed:
Output generated in 29.40 seconds (15.00 tokens/s, 441 tokens, context 113, seed 601091263)
If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.
If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.
The error is because ComfyUI just updated the dependencies past 2.1.2+cu121 a couple months ago basically without seeming to take into account that it would be guaranteed to cause this error in all cases on Windows, since Flash Attentiom was present in that version and lower but simply isn't anymore for unclear reasons. It's not "a lot of reading". You just have to manually reinstall specifically 2.1.2+cu121, which is the last version where Flash Attention existed in any way on Windows.
If you are here because of the Windows ComfyUI and you are using the portable version, this specific error is probably rooted in the way the portable version is designed. For me, after doing the non-intuitive git clone install and manually installing torch using pip, the error was gone. Yes, that means alot of reading. Enjoy.
The error is because ComfyUI just updated the dependencies past 2.1.2+cu121 a couple months ago basically without seeming to take into account that it would be guaranteed to cause this error in all cases on Windows, since Flash Attentiom was present in that version and lower but simply isn't anymore for unclear reasons. It's not "a lot of reading". You just have to manually reinstall specifically 2.1.2+cu121, which is the last version where Flash Attention existed in any way on Windows.
Do you have the commands to do that? Im not sure what the torchvision version should be in that case?
The command is:
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
But then XFormers will complain because the latest version requires Pytorch 2.3.0...
I am not sure if I want to go find what version xformers works with pytorch 2.1.2...
Might try to find how to get a flash added to the Pytorch 2.3
The command is:
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
But then XFormers will complain because the latest version requires Pytorch 2.3.0... I am not sure if I want to go find what version xformers works with pytorch 2.1.2...Might try to find how to get a flash added to the Pytorch 2.3
Thanks for the help!
for the xformers version matters i downgraded into 0.0.22.post4
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121
The command is:
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
for the xformers version matters i downgraded into 0.0.22.post4
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121
Sorry to ask a newbie question, but could you please kindly tell me on which folders exactly in ComfyUI you performed those install commands?
The command is:
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
for the xformers version matters i downgraded into 0.0.22.post4
pip install xformers==0.0.22.post4 --index-url https://download.pytorch.org/whl/cu121
Sorry to ask a newbie question, but could you please kindly tell me on which folders exactly in ComfyUI you performed those install commands?
run cmd in your, example : "X:\ComfyUI_windows_portable\python_embeded" folder then paste the commands inside the command prompt window.
In addition in cmd you can check package dependencies version for your personal packages version. to the pip show [package name] command, there is pipdeptree. Just do $ pip install pipdeptree then run $ pipdeptree