text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Unable to load bigger models in 8 bits mode

Open Manimap opened this issue 2 years ago • 5 comments

Hi, I was able to make everything from Pygmalion6B model to the Erebus13B load in 8bits, but for some reason trying to load the 20B Erebus model throws me this error :

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Traceback (most recent call last): File "D:\text-generation-webui\server.py", line 733, in model, tokenizer = load_model(model_name) File "D:\text-generation-webui\server.py", line 141, in load_model model = eval(command) File "", line 1, in File "C:\Users\USER\miniconda3\envs\textgen\lib\site-packages\transformers\models\auto\auto_factory.py", line 463, in from_pretrained return model_class.from_pretrained( File "C:\Users\USER\miniconda3\envs\textgen\lib\site-packages\transformers\modeling_utils.py", line 2326, in from_pretrained raise ValueError( ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you have set a value for max_memory you should increase that. To have an idea of the modules that are set on the CPU or RAM you can print model.hf_device_map.

I use this to load the model : python server.py --cai-chat --auto-devices --no-stream --bf16 --gpu-memory 19 --load-in-8bit GPU : 4090 OS : WIn10 RAM : 64GB

Observations

  • If I load another model like this in 8bit, for example the big 13B one, it works fine.
  • If I load the same 20B one without 8bit, then it loads too, but then it takes too much memory to generate anything.

Manimap avatar Feb 08 '23 11:02 Manimap

I also made this mistake

WhiteZz1 avatar Feb 09 '23 08:02 WhiteZz1

4090 has 24gb of vram. I'm not sure of the total size is for erebus20b, but it looks to be larger then 24gb to me just glancing at filesizes on huggingface. Obviously you can try the documented tricks to manage vram limitations https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide

Spencer-Dawson avatar Feb 12 '23 02:02 Spencer-Dawson

4090 has 24gb of vram. I'm not sure of the total size is for erebus20b, but it looks to be larger then 24gb to me just glancing at filesizes on huggingface. Obviously you can try the documented tricks to manage vram limitations https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide

Yes, like I said it starts fine when I already limit vram but not in 8bit mode. And I already uses the tricks : python server.py --cai-chat --auto-devices --no-stream --bf16 --gpu-memory 19 --load-in-8bit

Manimap avatar Feb 13 '23 11:02 Manimap

I don't think --gpu-memory works together with --load-in-8bit.

oobabooga avatar Feb 13 '23 13:02 oobabooga

I don't think --gpu-memory works together with --load-in-8bit.

Is there a way to say? Because for smaller models it clearly worked as I checked vram usage to exactly what I wanted. So maybe 8bit wasn't activated?

Manimap avatar Feb 13 '23 20:02 Manimap

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, you can reopen it (if you are the author) or leave a comment below.

github-actions[bot] avatar Mar 15 '23 23:03 github-actions[bot]

--gpu-memory with --load-in-8bit now works thanks to https://github.com/oobabooga/text-generation-webui/pull/358

oobabooga avatar Mar 16 '23 20:03 oobabooga