text-generation-webui
text-generation-webui copied to clipboard
M1 Mac --no-stream bug
Describe the bug
The --no-stream command raises an AssertionError("Torch not compiled with CUDA enabled") when you try to generate a response to an input prompt.
If I omit --no-stream the program works and generates a response to an input prompt as expected.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
python server.py --model facebook_opt-6.7b --listen --no-stream
Screenshot
No response
Logs
Last login: Tue Apr 25 13:13:52 on ttys000
userx@Users-MacBook-Pro ~ % cd Ai
userx@Users-MacBook-Pro Ai % cd text-generation-webui
userx@Users-MacBook-Pro text-generation-webui % source venv/bin/activate
(venv) userx@Users-MacBook-Pro text-generation-webui % python server.py --model facebook_opt-6.7b --listen --no-stream
Gradio HTTP request redirected to localhost :)
bin /Users/userx/Ai/text-generation-webui/venv/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/Users/userx/Ai/text-generation-webui/venv/lib/python3.11/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Loading facebook_opt-6.7b...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.51s/it]
Loaded the model in 8.05 seconds.
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
File "/Users/userx/Ai/text-generation-webui/modules/text_generation.py", line 260, in generate_reply
output = output.cuda()
^^^^^^^^^^^^^
File "/Users/userx/Ai/text-generation-webui/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 256, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Output generated in 26.68 seconds (5.44 tokens/s, 145 tokens, context 3, seed 1036677000)
System Info
M1 Max Macbook Pro 32GB
Ventura 13.3.1