text-generation-webui
text-generation-webui copied to clipboard
Please add OpenChatKit support!
OpenChatKit is now out with GPT-NeoXT-Chat-Base-20B
model.
Plz add support for this.
Thank you!!!!!!
I think it is supported out of the box, and I have tried it without any problem. What do you encounter?
I'm getting this instantly on a RTX4080. Doesn't even try to load it. Normally it takes some time to fill up, so I don't think it's actually running out of VRAM. Tried with --auto-devices and --gpu-memory 10
(textgen) D:\Downloads\text-generation-webui>python server.py --model GPT-NeoXT-Chat-Base-20B --load-in-8bit
Loading GPT-NeoXT-Chat-Base-20B...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
Traceback (most recent call last):
File "D:\Downloads\text-generation-webui\server.py", line 197, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "D:\Downloads\text-generation-webui\modules\models.py", line 130, in load_model
model = eval(command)
File "<string>", line 1, in <module>
File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2578, in from_pretrained
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you have set a value for `max_memory` you should increase that. To have
an idea of the modules that are set on the CPU or RAM you can print model.hf_device_map.
I'm getting this instantly on a RTX4080. Doesn't even try to load it. Normally it takes some time to fill up, so I don't think it's actually running out of VRAM. Tried with --auto-devices and --gpu-memory 10
(textgen) D:\Downloads\text-generation-webui>python server.py --model GPT-NeoXT-Chat-Base-20B --load-in-8bit Loading GPT-NeoXT-Chat-Base-20B... ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ Traceback (most recent call last): File "D:\Downloads\text-generation-webui\server.py", line 197, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "D:\Downloads\text-generation-webui\modules\models.py", line 130, in load_model model = eval(command) File "<string>", line 1, in <module> File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 471, in from_pretrained return model_class.from_pretrained( File "C:\ProgramData\Anaconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2578, in from_pretrained raise ValueError( ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you have set a value for `max_memory` you should increase that. To have an idea of the modules that are set on the CPU or RAM you can print model.hf_device_map.
--load-in-8bit does not work with auto devices / memory offloading. Either you need to remove the 8bit flags and run it in 16bit, or manually edit some files: https://github.com/oobabooga/text-generation-webui/issues/193#issuecomment-1461222868
Manually editing those files worked. Cheers
--load-in-8bit does not work with auto devices / memory offloading
This does work now, there has been a PR a week or two ago.