text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Please add OpenChatKit support!

Open jkyndir opened this issue 1 year ago • 4 comments

OpenChatKit is now out with GPT-NeoXT-Chat-Base-20B model. Plz add support for this. Thank you!!!!!!

jkyndir avatar Mar 15 '23 12:03 jkyndir

I think it is supported out of the box, and I have tried it without any problem. What do you encounter?

sgsdxzy avatar Mar 15 '23 13:03 sgsdxzy

I'm getting this instantly on a RTX4080. Doesn't even try to load it. Normally it takes some time to fill up, so I don't think it's actually running out of VRAM. Tried with --auto-devices and --gpu-memory 10

(textgen) D:\Downloads\text-generation-webui>python server.py --model GPT-NeoXT-Chat-Base-20B --load-in-8bit
Loading GPT-NeoXT-Chat-Base-20B...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
Traceback (most recent call last):
  File "D:\Downloads\text-generation-webui\server.py", line 197, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "D:\Downloads\text-generation-webui\modules\models.py", line 130, in load_model
    model = eval(command)
  File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
  File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2578, in from_pretrained
    raise ValueError(
ValueError:
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you have set a value for `max_memory` you should increase that. To have
                        an idea of the modules that are set on the CPU or RAM you can print model.hf_device_map.

Tophness avatar Mar 16 '23 07:03 Tophness

I'm getting this instantly on a RTX4080. Doesn't even try to load it. Normally it takes some time to fill up, so I don't think it's actually running out of VRAM. Tried with --auto-devices and --gpu-memory 10

(textgen) D:\Downloads\text-generation-webui>python server.py --model GPT-NeoXT-Chat-Base-20B --load-in-8bit
Loading GPT-NeoXT-Chat-Base-20B...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
Traceback (most recent call last):
  File "D:\Downloads\text-generation-webui\server.py", line 197, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "D:\Downloads\text-generation-webui\modules\models.py", line 130, in load_model
    model = eval(command)
  File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
  File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2578, in from_pretrained
    raise ValueError(
ValueError:
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you have set a value for `max_memory` you should increase that. To have
                        an idea of the modules that are set on the CPU or RAM you can print model.hf_device_map.

--load-in-8bit does not work with auto devices / memory offloading. Either you need to remove the 8bit flags and run it in 16bit, or manually edit some files: https://github.com/oobabooga/text-generation-webui/issues/193#issuecomment-1461222868

sgsdxzy avatar Mar 16 '23 07:03 sgsdxzy

Manually editing those files worked. Cheers

Tophness avatar Mar 16 '23 08:03 Tophness

--load-in-8bit does not work with auto devices / memory offloading

This does work now, there has been a PR a week or two ago.

oobabooga avatar Mar 29 '23 03:03 oobabooga