text-generation-webui
text-generation-webui copied to clipboard
DefaultCPUAllocator: not enough memory
Describe the bug
After running "start_windows.bat" it starts to load the model "models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors", and FSM I get RuntimeError RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes.
I have tried solutions found on the internet for this, didn't solved this issue.
I have tried changing VRAM settings of SDD and HDD, and using parameters like "--disk, -pre_layer 25, --wbits 4, --groupsize 128"
this is my parameters in webui.py: run_cmd("python server.py --chat --auto-devices --disk --wbits 4 --groupsize 128 --model-menu", environment=True)
Is there an existing issue for this?
- [x] I have searched the existing issues
Reproduction
Guys if any of you know a possible solution, could you please share
Screenshot
No response
Logs
INFO:Gradio HTTP request redirected to localhost :)
bin D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117_nocublaslt.dll
INFO:Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g...
INFO:Found the following quantized model: models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors
Traceback (most recent call last):
File "D:\AI\oobabooga_windows\text-generation-webui\server.py", line 919, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "D:\AI\oobabooga_windows\text-generation-webui\modules\models.py", line 159, in load_model
model = load_quantized(model_name)
File "D:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py", line 175, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer)
File "D:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 214, in load_quant
model = LlamaForCausalLM(config)
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in __init__
self.model = LlamaModel(config)
File "D:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 21, in __init__
super().__init__(config)
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in __init__
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in <listcomp>
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 256, in __init__
self.mlp = LlamaMLP(
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 153, in __init__
self.up_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in __init__
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes.
Done!
Press any key to continue . . .
System Info
CPU: AMD Athlon X4 645 3.1 GHz 4 Core
GPU: Nvidia GTX 1060 6GB Ti OC
RAM: 8 GB
SSD: 1,2 GB available (full...)
HDD: 700 GB available (installed on this drive)
OS: Win 10 Home (x64)
Total VRAM: 11,8 GB
Available VRAM: 5,88 GB
Paging file space: 3,78 GB
You have 8gb of ram and 6gb of vram. Sorry.. a 13b isn't happening. Maybe through llama cpp and that nmap or whatever.
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.