text-generation-webui Error during response generation on RTX 5090

Describe the bug

I'm getting the following error trying to use Oobabooga on a 5090 card. All libraries have been manually updated as needed around pytorch 2.7 on CUDA 12.8. The UI loads, I can ask a question, and I start to get a snippet of a response but after just a few words I crash in the console with the following and the answer stops.

AI:

11:37:59-204916 INFO WARPERS= ['MinPLogitsWarper']

Traceback (most recent call last): File "C:\AI-Content\text-generation-webui\text-generation-webui\modules\text_generation.py", line 445, in generate_reply_HF new_content = get_reply_from_output_ids(output, state, starting_from=starting_from) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\AI-Content\text-generation-webui\text-generation-webui\modules\text_generation.py", line 266, in get_reply_from_output_ids reply = decode(output_ids[starting_from:], state['skip_special_tokens'] if state else True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\AI-Content\text-generation-webui\text-generation-webui\modules\text_generation.py", line 176, in decode return shared.tokenizer.decode(output_ids, skip_special_tokens=skip_special_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\AI-Content\text-generation-webui\text-generation-webui\installer_files\env\Lib\site-packages\transformers\tokenization_utils_base.py", line 3860, in decode return self._decode( ^^^^^^^^^^^^^ File "C:\AI-Content\text-generation-webui\text-generation-webui\installer_files\env\Lib\site-packages\transformers\tokenization_utils_fast.py", line 668, in _decode text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OverflowError: can't convert negative int to unsigned Output generated in 0.29 seconds (10.40 tokens/s, 3 tokens, context 479, seed 68386361)

Is there an existing issue for this?

[x] I have searched the existing issues

Reproduction

Load UI, ask a question.

Screenshot

No response

Logs

The full error is in the original message.

System Info

RTX 5090 on Windows 11

Mar 16 '25 15:03 elkay

Some additional information - After testing a bit further, this error only seems to be happening when using GPTQ format models (my preference due to speed). When I loaded the same model in GGUF format, everything does actually work.

Mar 16 '25 16:03 elkay

Did a little more digging - GPTQ format works when loading with ExLlamav2. The above error is happening with ExLlamav2_HF, which used to work before the 5090 upgrade and associated pytorch 2.7 and CUDA 12.8 fixes to get everything in general working again.

Mar 17 '25 03:03 elkay

Can you share what changes you made to get it working on your 5090?

Apr 10 '25 07:04 SoftologyPro