text-generation-webui Fails to load model

Describe the bug

I did just about everything in the low Vram guide and it still fails, and is the same message every time. I'm using this model, gpt4-x-alpaca-13b-native-4bit-128g

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Attempt to load model

Screenshot

No response

Logs

Traceback (most recent call last):
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\text-generation-webui\server.py”, line 84, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\models.py”, line 103, in load_model
model = load_quantized(model_name)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py”, line 151, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py”, line 32, in _load_quant
model = AutoModelForCausalLM.from_config(config)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py”, line 411, in from_config
return model_class._from_config(config, **kwargs)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py”, line 1146, in _from_config
model = cls(config, **kwargs)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 614, in init
self.model = LlamaModel(config)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 445, in init
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 445, in
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 255, in init
self.self_attn = LlamaAttention(config=config)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 176, in init
self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False)
File “G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py”, line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 52428800 bytes.

System Info

Windows 10
Intel Core I7 860
16GB DDR3 low density RAM
GTX 1650 SUPER

Apr 14 '23 18:04 Userbingd

i have the same error

Apr 14 '23 19:04 greynutella

4-bit models get loaded into RAM before being sent to VRAM. You need to have enough free RAM for it to load, or it will just fail. This model will require at least 10gb of unused RAM to load.

Apr 14 '23 20:04 jllllll

4-bit models get loaded into RAM before being sent to VRAM. You need to have enough free RAM for it to load, or it will just fail. This model will require at least 10gb of unused RAM to load.

I have 32gb and getting the same error, is that not enough?

For reference, here's my specs:

Windows 11
Intel Core i5-10400F
32GB DDR4 RAM
Nvidia Geforce RTX 3060 (12GB)

Apr 15 '23 03:04 kickturn

@kickturn That should be plenty. Your ram and vram amounts are essentially the same as my system and I'm not getting this issue.

What arguments are you using to launch the webui?

Apr 15 '23 03:04 jllllll

@jllllll Only --auto-devices --chat --gpu-memory 11 I removed and switched some parameters with no luck

Apr 15 '23 04:04 kickturn

@kickturn --auto-devices and --gpu-memory don't apply to 4-bit models.

Watch your ram allocation in task manager while you try to load the model. Does your ram fill up completely?

Apr 15 '23 04:04 jllllll

@jllllll Nope, in fact, it did not change with half of it free. I checked both resource monitor and task manager.

I will try different models later and see what I get

Apr 15 '23 04:04 kickturn

did you load the model without using --wbits 4 --groupsize 128 ?

Apr 15 '23 13:04 kuso-ge

same error with 16gb ram

Apr 16 '23 22:04 westraven

Here's mine, same error. Using: --auto-devices --chat --model-menu --wbits 4 --groupsize 128

Error log:

Loading vicuna-13b-GPTQ-4bit-128g...
Found the following quantized model: models\vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors
Traceback (most recent call last):
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\server.py", line 905, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\modules\models.py", line 117, in load_model
model = load_quantized(model_name)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 172, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 40, in _load_quant
model = AutoModelForCausalLM.from_config(config)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 411, in from_config
return model_class._from_config(config, **kwargs)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 1146, in _from_config
model = cls(config, **kwargs)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in init
self.model = LlamaModel(config)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in init
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 256, in init
self.mlp = LlamaMLP(
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 151, in init
self.gate_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
File "E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes.

Error when loading the web UI without a model, then trying to enable it from the "Models" tab (same error, different amount of memory in the last line):

Traceback (most recent call last):
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\[server.py](http://server.py/)”, line 85, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\modules\[models.py](http://models.py/)”, line 117, in load_model
model = load_quantized(model_name)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py”, line 172, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py”, line 40, in _load_quant
model = AutoModelForCausalLM.from_config(config)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py”, line 411, in from_config
return model_class._from_config(config, **kwargs)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py”, line 1146, in _from_config
model = cls(config, **kwargs)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 614, in init
self.model = LlamaModel(config)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 445, in init
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 445, in
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 255, in init
self.self_attn = LlamaAttention(config=config)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 178, in init
self.v_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False)
File “E:\Proyectos_AI\ChatGPT\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\[linear.py](http://linear.py/)”, line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 52428800 bytes.

PC Specs:

Windows 11 22H2
Intel Core i7-12700F
16GB DDR4 RAM
Nvidia Geforce RTX 3060ti (8GB)

Apr 17 '23 04:04 Csf91

I've done some more tests and still can't load it. No matter what I do. Whether it's CPU or GPU, it fails. Even setting a huge paging file doesn't help. I even tried loading pygmalion, and that fails as well, but unlike gpt4xalpaca it doesn't tell me exactly what went wrong.

Apr 18 '23 15:04 Userbingd

I've done some more tests and still can't load it. No matter what I do. Whether it's CPU or GPU, it fails. Even setting a huge paging file doesn't help. I even tried loading pygmalion, and that fails as well, but unlike gpt4xalpaca it doesn't tell me exactly what went wrong.

Same here, no matter what I've tried, the only thing that changes is the amount of memory in the last line, as shown in my last message. Maybe it is installing the wrong version of some dependencies even if manually running the requirements install from requirements.txt?

Apr 18 '23 19:04 Csf91

I doubt that's what is happening. I've never once had this issue no matter how I install the webui. I can't see anything at all in any of the reports that hints at what the cause could be.

Apr 18 '23 19:04 jllllll

Having the same issue. 16gb of RAM, 2080ti. RAM might be slightly cluttered, but it's not even attempting to fill it.

Apr 19 '23 18:04 made9

4-bit models get loaded into RAM before being sent to VRAM. You need to have enough free RAM for it to load, or it will just fail. This model will require at least 10gb of unused RAM to load.

Is there any way to change this? I only have 8GB in my laptop and sadly can't open it because off the waranty, let alone having 10gigs freed.

Apr 22 '23 14:04 AlexK-TUES

I'm trying to use anon8231489123/vicuna-13b-GPTQ-4bit-128g.
Specs: 5600h,8GB ram,1650m. Memory jumps from 50% to 63%(firefox is opened and uses 1GB + tiny 11 using around 3GB ). Aldo 8GB ram may not be enough to run the model, i'm supprised it hit 63% and not 100 while trying to fill the ram(if it even tryed). Is 8GB ram and 4GB vram good enough to load and run a model(even if it'll be responding relatively slowly ), or shall i give up on trying to run anything?

Apr 22 '23 15:04 AlexK-TUES

@Alexs4572 Your best bet is to use 7b models or smaller. That 13b model uses way more than 4gb vram. Most 13b 4bit models use 7-9gb vram.

Apr 22 '23 16:04 jllllll

@Alexs4572 Your best bet is to use 7b models or smaller. That 13b model uses way more than 4gb vram. Most 13b 4bit models use 7-9gb vram.

I've tried with https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g/tree/main as well, same exact result. The VRAM and RAM aren't even trying to load it. Seems like a memory allocation issue to me, not the hardware lacking. Here's why:

This image in the background is AI-Generated using A1111 and the 2dn1 sfetensor model, But I've also used bigger models without any kind of issue, at all. When they load both RAM and VRAM go up in % of use, sometimes even up to 100% if the model is too big for my pc, and then crash due to lack of memory (just explaining the behaviour it usually has, when it actually lacks resources).

Now, related to the actual issue here: this isn't even attempting to do load it into the memory other than the applet/launcher itself. When it starts to load you can see a peak in the clocks for the GPU memory and a small peak in the PC's RAM, which is just loading the applet. Though, from there it's just a flatline as it fails to load it into the memory.

Also, the graphs are small because i've reset them just before opening it, so that they'd only have the data from the moment i've opened it until the crash/bug.

Having the same issue. 16gb of RAM, 2080ti. RAM might be slightly cluttered, but it's not even attempting to fill it.

Is your GPU the 8GB or the 11GB version? Edited the message since noticed there was two versions of it.

I hope this helps :)

Apr 23 '23 17:04 Csf91

Mine is an 11gb version. To be honest, I didn't even know that there's an 8gb version of that thing lol. But yeah, I have the same issue as you. And I can also use stable diffusion without issues. And just to check, I loaded the non-4-bit version of the same model into the webUI. And yes, that thing DID attempts to load and filled up my RAM and VRAM. So yeah, this error is related to 4-bit models on my side.

Apr 23 '23 23:04 made9

Quick update for everyone in the thread, tried this and it seems to load now, though what a insane amount of memory allocation it does on the pagefile (at some point up to 50gb)! I had it set maximum at abound 10gb, but needed to be set as "Managed by the O.S.".

See here for details: https://youtu.be/FOyqcETVUCs?t=564 (9:24) Got it to run by doing that, proof:

Can't believe it was such a stupid thing, specially when i had it set up at quite a high amount already...

Another one showing that it works, from the webui:

Apr 24 '23 05:04 Csf91

My problem definitely isn't the page file not being big enough. I have a page file on a drive with nearly 500GB free set to system managed. Never can get it to run, even with that.

Apr 24 '23 18:04 Userbingd

@Userbingd Setting it to system managed is exactly why it is not big enough. Manually set it to 10 or 20GB.

System managed just means that Windows will decide when it needs to be resized. The issue is that this resizing does not happen instantly and the system does not always know how much memory in total is trying to be loaded. This results in Windows thinking that adding 500MB is enough when you actually need 5GB+ to be added.

Apr 24 '23 19:04 jllllll

@Userbingd Setting it to system managed is exactly why it is not big enough. Manually set it to 10 or 20GB.

@jllllll System managed just means that Windows will decide when it needs to be resized. The issue is that this resizing does not happen instantly and the system does not always know how much memory in total is trying to be loaded. This results in Windows thinking that adding 500MB is enough when you actually need 5GB+ to be added.

Regarding that, mine was originally at 10Gb and seemed not to be enough, so i'd try with at least 20.

Apr 24 '23 19:04 Csf91

I was able to get pygmalion to "load" but it gives me this error when I try to generate anything Traceback (most recent call last): File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\callbacks.py", line 66, in gentask ret = self.mfunc(callback=_callback, **self.kwargs) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\text-generation-webui\modules\text_generation.py", line 231, in generate_with_callback shared.model.generate(**kwargs) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1508, in generate return self.sample( File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2547, in sample outputs = self( File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 852, in forward transformer_outputs = self.transformer( File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 687, in forward outputs = block( File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\gptj\modeling_gptj.py", line 307, in forward hidden_states = self.ln_1(hidden_states) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\normalization.py", line 190, in forward return F.layer_norm( File "G:\Oobabooga Text UI\oobabooga-windows\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\functional.py", line 2515, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' Output generated in 1.76 seconds (0.00 tokens/s, 0 tokens, context 32)

Apr 25 '23 00:04 Userbingd

Never mind, I turned Cpu off and am no longer getting that... now I'm getting RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) I can turn cpu off in the webui, but not from the launcher args...It refuses to run on my GPU

Apr 25 '23 00:04 Userbingd

I'm also getting this. 64GB system ram and 24 VRAM. The model is like 18GB, and I have more than enough ram to handle it on both types.

Apr 27 '23 03:04 MillionthOdin16

I'm starting to think it's a problem with some unknown hardware or software incompatibility. My CPU is old, it's a core I7 860, and my GPU is a 1650 super. Though I have no idea where the problem lies in actuality. This is just my guess. I don't think it's a normal problem since a lot of the software to run this is the same or similar to stable diffusion, at least from how it seems with my untrained eyes. And I can run SD just fine, if only at low resolutions. But something about these text gen AI's have an issue with my PC.

Apr 27 '23 08:04 Userbingd

Had the same problem, upon checking file the file hash I realized that it borked the download, re downloading the file gpt-x-alpaca-13b-native-4bit-128g-cuda.pt worked, try this

May 05 '23 22:05 Syncriix

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sep 01 '23 23:09 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

Fails to load model

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard