text-generation-webui DefaultCPUAllocator: not enough memory

Describe the bug

After running "start_windows.bat" it starts to load the model "models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors", and FSM I get RuntimeError RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes. I have tried solutions found on the internet for this, didn't solved this issue. I have tried changing VRAM settings of SDD and HDD, and using parameters like "--disk, -pre_layer 25, --wbits 4, --groupsize 128"

this is my parameters in webui.py: run_cmd("python server.py --chat --auto-devices --disk --wbits 4 --groupsize 128 --model-menu", environment=True)

Is there an existing issue for this?

[x] I have searched the existing issues

Reproduction

Guys if any of you know a possible solution, could you please share

Screenshot

No response

Logs

INFO:Gradio HTTP request redirected to localhost :)
bin D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117_nocublaslt.dll
INFO:Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g...
INFO:Found the following quantized model: models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors
Traceback (most recent call last):
  File "D:\AI\oobabooga_windows\text-generation-webui\server.py", line 919, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "D:\AI\oobabooga_windows\text-generation-webui\modules\models.py", line 159, in load_model
    model = load_quantized(model_name)
  File "D:\AI\oobabooga_windows\text-generation-webui\modules\GPTQ_loader.py", line 175, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, shared.args.pre_layer)
  File "D:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 214, in load_quant
    model = LlamaForCausalLM(config)
  File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in __init__
    self.model = LlamaModel(config)
  File "D:\AI\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 21, in __init__
    super().__init__(config)
  File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in __init__
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 445, in <listcomp>
    self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
  File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 256, in __init__
    self.mlp = LlamaMLP(
  File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 153, in __init__
    self.up_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
  File "D:\AI\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 141557760 bytes.

Done!
Press any key to continue . . .

System Info

CPU: AMD Athlon X4 645 3.1 GHz 4 Core
GPU: Nvidia GTX 1060 6GB Ti OC
RAM: 8 GB
SSD: 1,2 GB available (full...)
HDD: 700 GB available (installed on this drive)
OS: Win 10 Home (x64)
Total VRAM: 11,8 GB
Available VRAM: 5,88 GB
Paging file space: 3,78 GB