text-generation-webui MacBook M1 No GPU Detected and RuntimeError: probability tensor contains either `inf`, `nan` or element

I can't use the UI with the LLaMA-7B model and I get the following output;

Loading LLaMA-7B... Warning: no GPU has been detected. Falling back to CPU mode.

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 33/33 [00:05<00:00, 6.12it/s] Loaded the model in 78.64 seconds. Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). 0%| | 0/26 [00:57<?, ?it/s] Traceback (most recent call last): File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/gradio/routes.py", line 374, in run_predict output = await app.get_blocks().process_api( File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1017, in process_api result = await self.call_function( File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 849, in call_function prediction = await anyio.to_thread.run_sync( File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 453, in async_iteration return next(iterator) File "/Users/elahmday/Desktop/LLaMA/text-generation-webui/modules/text_generation.py", line 213, in generate_reply output = eval(f"shared.model.generate({', '.join(generate_params)}){cuda}")[0] File "", line 1, in File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/transformers/generation/utils.py", line 1452, in generate return self.sample( File "/Users/elahmday/miniconda3/lib/python3.10/site-packages/transformers/generation/utils.py", line 2504, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

Mar 10 '23 04:03 moelahmady

I have the RuntimeError: probability tensor contains either inf, nan or element < 0 on Windows 11 with nVidia RTX 3060

Mar 10 '23 08:03 vmajor

I have the RuntimeError: probability tensor contains either inf, nan or element < 0 on Windows 11 with nVidia RTX 3060

same. please let me know if you fix it somehow.

Mar 10 '23 16:03 srijansaxena11

Having the same issue here on MacBook M1 Pro, let me know if you find a fix!

Mar 10 '23 21:03 jacobmlloyd

I have the RuntimeError: probability tensor contains either inf, nan or element < 0 on Windows 11 with nVidia RTX 3060

same. please let me know if you fix it somehow.

I set do_sample to false and it seems to work. Not sure how this will effect overall response.

Mar 10 '23 21:03 jacobmlloyd

I have the RuntimeError: probability tensor contains either inf, nan or element < 0 on Windows 11 with nVidia RTX 3060

same. please let me know if you fix it somehow.

I set do_sample to false and it seems to work. Not sure how this will effect overall response.

Where did you set this?

Mar 10 '23 23:03 moelahmady

I have the RuntimeError: probability tensor contains either inf, nan or element < 0 on Windows 11 with nVidia RTX 3060

same. please let me know if you fix it somehow.

I set do_sample to false and it seems to work. Not sure how this will effect overall response.

Where did you set this?

In the web interface under custom generation parameters, you can uncheck the do_sample box. It ruins the response, and on M1 Pro it takes 2 hours to generate a full response on the 7B model.

Mar 10 '23 23:03 jacobmlloyd

I have the RuntimeError: probability tensor contains either inf, nan or element < 0 on Windows 11 with nVidia RTX 3060

same. please let me know if you fix it somehow.

I set do_sample to false and it seems to work. Not sure how this will effect overall response.

Where did you set this?

In the web interface under custom generation parameters, you can uncheck the do_sample box. It ruins the response, and on M1 Pro it takes 2 hours to generate a full response on the 7B model.

2 hours!!!

Mar 10 '23 23:03 moelahmady

+1. The same error when running in Ubuntu RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Mar 13 '23 16:03 s1530129650

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

Apr 13 '23 16:04 github-actions[bot]

text-generation-webui
text-generation-webui copied to clipboard

MacBook M1 No GPU Detected and RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

text-generation-webui text-generation-webui copied to clipboard

MacBook M1 No GPU Detected and RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

text-generation-webui
text-generation-webui copied to clipboard