text-generation-webui "RuntimeError: probability tensor contains either `inf`, `nan` or element

Describe the bug

I was able to get the smaller models working, OPT 1.3b, pygmalion 1.3b, but as suspected they were lackluster. I'm working on windows 10 with a 1070 Ti with 8gb of VRAM, and thought there was no chance for me using the bigger models, but... There's all these Reddit posts on halving the size with 8bit/bitsandbytes: https://www.reddit.com/r/MachineLearning/comments/11kwdu9/d_tutorial_run_llama_on_8gb_vram_on_windows/ https://www.reddit.com/r/PygmalionAI/comments/10r4ua5/anyone_run_locally_and_tried_to_use_8bit_on_older/ https://www.reddit.com/r/PygmalionAI/comments/10o0dfp/model_8bit_optimization_through_wsl/

I followed through them, and am getting this error when trying to generate text:

To create a public link, set share=True in launch(). D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py:195: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorCompare.cpp:413.) attn_weights = torch.where(causal_mask, attn_weights, mask_value) Exception in thread Thread-3 (gentask): Traceback (most recent call last): File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\threading.py", line 1016, in _bootstrap_inner self.run() File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "D:\oobabooga\one-click-installers-oobabooga-windows\text-generation-webui\modules\callbacks.py", line 65, in gentask ret = self.mfunc(callback=_callback, **self.kwargs) File "D:\oobabooga\one-click-installers-oobabooga-windows\text-generation-webui\modules\text_generation.py", line 199, in generate_with_callback shared.model.generate(**kwargs) File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1452, in generate return self.sample( File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2504, in sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

Is this just a bug or am I unable to use bitsandbytes? The only other 10 series card I have seen "working" was this guy with a 1080 Ti on dreambooth (his fix did not work for me)- https://github.com/james-things/bitsandbytes-prebuilt-all_arch

Feb 2nd bitsandbytes update says "Int8 Matmul backward for all GPUs" and in the hardware section I would meet the requirements (between a NVIDIA Kepler GPU or newer a GPU from 2018 or older)? However, in a blog it also stated: Support for Kepler GPUs (GTX 1080 etc)

While we support all GPUs from the past four years, some old GPUs like GTX 1080 still see heavy use. While these GPUs do not have Int8 tensor cores, they do have Int8 vector units (a kind of "weak" tensor core). As such, these GPUs can also experience Int8 acceleration. However, it requires a entire different stack of software for fast inference. While we do plan to integrate support for Kepler GPUs to make the LLM.int8() feature more widely available, it will take some time to realize this due to its complexity.

I have tried re-installing everything manually with big boy anaconda, but had no luck and just ended up with the same error message.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Enable load in 8 bit mode. Launch oobabooga. Load any model. Attempt to generate text. Nothing - RuntimeError: probability tensor contains either inf, nan or element < 0

Screenshot

Logs

To create a public link, set `share=True` in `launch()`.
D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py:195: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorCompare.cpp:413.)
  attn_weights = torch.where(causal_mask, attn_weights, mask_value)
Exception in thread Thread-3 (gentask):
Traceback (most recent call last):
  File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\oobabooga\one-click-installers-oobabooga-windows\text-generation-webui\modules\callbacks.py", line 65, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\oobabooga\one-click-installers-oobabooga-windows\text-generation-webui\modules\text_generation.py", line 199, in generate_with_callback
    shared.model.generate(**kwargs)
  File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1452, in generate
    return self.sample(
  File "D:\oobabooga\one-click-installers-oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2504, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

System Info

Windows 10, EVGA Geforce 1070 Ti.

Mar 23 '23 07:03 DaBajan

change int threshold and it might work. do like 0.5-1.5 see hint:https://github.com/oobabooga/text-generation-webui/pull/198

it works for me on linux, should work the same on windows.. but I have 24g of ram

Mar 23 '23 10:03 Ph0rk0z

I'm guessing that you used the libbitsandbytes_cuda116.dll linked in that first guide? If so then you should use libbitsandbytes_cudaall.dll. It has better compatibility with older GPUs. It works on my 1080ti.

Mar 23 '23 12:03 jllllll

change int threshold and it might work. do like 0.5-1.5 see hint:#198

it works for me on linux, should work the same on windows.. but I have 24g of ram

Is this right? python server.py --auto-devices --load-in-8bit --cai-chat --int8-threshold %1 I am gettings this error when launching: server.py: error: unrecognized arguments: --int8-threshold 1

Mar 23 '23 20:03 DaBajan

What are the relevant packages that support the --int8-threshold argument? Is this the right version of pytorch? python 3.9.16 h6244533_2 pytorch 2.0.0 py3.9_cuda11.7_cudnn8_0 pytorch pytorch-cuda 11.7 h16d0643_3 pytorch pytorch-mutex 1.0 cuda pytorch****

Is something wrong with cuda? cuda-cccl 12.1.55 0 nvidia cuda-cudart 11.7.99 0 nvidia cuda-cudart-dev 11.7.99 0 nvidia cuda-cupti 11.7.101 0 nvidia cuda-libraries 11.7.1 0 nvidia cuda-libraries-dev 11.7.1 0 nvidia cuda-nvrtc 11.7.99 0 nvidia cuda-nvrtc-dev 11.7.99 0 nvidia cuda-nvtx 11.7.91 0 nvidia cuda-runtime 11.7.1 0 nvidia

Mar 23 '23 20:03 DaBajan

You will have to add it to where it loads the model inside of models.py and pray.

Mar 23 '23 21:03 Ph0rk0z

You will have to add it to where it loads the model inside of models.py and pray.

What should I be adding to models.py?

Mar 24 '23 00:03 DaBajan

I do it like this: https://github.com/Ph0rk0z/text-generation-webui-testing/commit/ecad08f54c3282356888ee8f4dbf112cb331544a

Mar 24 '23 00:03 Ph0rk0z

I do it like this: Ph0rk0z@ecad08f

That worked, you the man 👍

Mar 24 '23 01:03 DaBajan

text-generation-webui
text-generation-webui copied to clipboard

"RuntimeError: probability tensor contains either `inf`, `nan` or element < 0" 8bit mode not working 1070 Ti.

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui text-generation-webui copied to clipboard

"RuntimeError: probability tensor contains either `inf`, `nan` or element < 0" 8bit mode not working 1070 Ti.

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard