text-generation-webui
text-generation-webui copied to clipboard
Unable to load bigger models in 8 bits mode
Hi, I was able to make everything from Pygmalion6B model to the Erebus13B load in 8bits, but for some reason trying to load the 20B Erebus model throws me this error :
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
File "D:\text-generation-webui\server.py", line 733, in max_memory
you should increase that. To have
an idea of the modules that are set on the CPU or RAM you can print model.hf_device_map.
I use this to load the model : python server.py --cai-chat --auto-devices --no-stream --bf16 --gpu-memory 19 --load-in-8bit GPU : 4090 OS : WIn10 RAM : 64GB
Observations
- If I load another model like this in 8bit, for example the big 13B one, it works fine.
- If I load the same 20B one without 8bit, then it loads too, but then it takes too much memory to generate anything.
I also made this mistake
4090 has 24gb of vram. I'm not sure of the total size is for erebus20b, but it looks to be larger then 24gb to me just glancing at filesizes on huggingface. Obviously you can try the documented tricks to manage vram limitations https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide
4090 has 24gb of vram. I'm not sure of the total size is for erebus20b, but it looks to be larger then 24gb to me just glancing at filesizes on huggingface. Obviously you can try the documented tricks to manage vram limitations https://github.com/oobabooga/text-generation-webui/wiki/Low-VRAM-guide
Yes, like I said it starts fine when I already limit vram but not in 8bit mode. And I already uses the tricks : python server.py --cai-chat --auto-devices --no-stream --bf16 --gpu-memory 19 --load-in-8bit
I don't think --gpu-memory works together with --load-in-8bit.
I don't think --gpu-memory works together with --load-in-8bit.
Is there a way to say? Because for smaller models it clearly worked as I checked vram usage to exactly what I wanted. So maybe 8bit wasn't activated?
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, you can reopen it (if you are the author) or leave a comment below.
--gpu-memory
with --load-in-8bit
now works thanks to https://github.com/oobabooga/text-generation-webui/pull/358