text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Error loading gpt4-x-alpaca-13b-native-4bit-128g on Alienware M15 Ryzen Edition R5 laptop

Open omar-shatla opened this issue 2 years ago • 14 comments

[I'm aware that similar cases of this issue have been reported by fellow members and one stated that his case like me that he was following tutorial steps from youtuber Aitreprener. I just needed to emphasize on this problem and this specific case for I am really confused by it. This same issue also happened when trying to use the web UI for Vicuna.]

Description:

I am trying to use the Oobabooga web UI on my Alienware M15 Ryzen Edition R5 laptop. However, I am receiving the following error message:

RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 327690240 bytes.

I have checked the installation instructions and the documentation for the oobabooga web UI, and I believe that the oobabooga web UI is installed correctly and is compatible with my hardware configuration. I have also tried restarting the oobabooga web UI, but the error persists.

Laptop specs:

CPU: AMD Ryzen 7 5800H GPU: NVIDIA GeForce RTX 3050 Ti Laptop GPU RAM: 16GB Storage: 512GB SSD I would appreciate any help you can provide in resolving this issue.

Thank you

omar-shatla avatar Apr 10 '23 16:04 omar-shatla

Same issue as https://github.com/oobabooga/text-generation-webui/issues/955#issue-1659852930. That has been closed though, even though it's still an issue for a lot of us. We should use this thread if possible from now on.

MancV21 avatar Apr 10 '23 16:04 MancV21

Same issue as #955 (comment). That has been closed though, even though it's still an issue for a lot of us. We should use this thread if possible from now on.

nods i want it to run. these out of memory errors are just strange to me. its definitely not normal.

SuperFurias avatar Apr 10 '23 17:04 SuperFurias

I finally got the model to load, and the webui to start.

Indeed, I had to assign 100Gb of virtual memory, I also reverted to an older version of GPTQ_loader.py. Also I had to ensure that my model name was the same as the directory that it was in, except the model needed to have an extra suffix (4bit) before the .pt extension, so models\gpt-x-alpaca-13b-native-4bit-128g\gpt-x-alpaca-13b-native-4bit-128g-4bit.pt

I did also install Protobuf in my earlier attempts, but I'm not sure whether it made a difference or not.

Using this: call python server.py --listen --listen-port 7870 --pre_layer 31 --chat --wbits 4 --groupsize 128

It is however incredibly slow, even though I am only using 2% CPU, 6.7Gb of VRAM and 8.5Gb of RAM according to Task Manager.

johnswan avatar Apr 10 '23 18:04 johnswan

Managed to run on RTX 3050 as well, increasing virtual memory over 100GB on windows settings and using the "pre_layer" flag.

kenedos avatar Apr 10 '23 18:04 kenedos

It's going to run slowly using virtual RAM, but as long as it runs I suppose that's something. I still think something is off though as others aren't having to do this and are able to run it without setting their virtual RAM to a larger amount. I'll still wait for now to see if it's rectified.

MancV21 avatar Apr 10 '23 18:04 MancV21

I finally got the model to load, and the webui to start.

Indeed, I had to assign 100Gb of virtual memory, I also reverted to an older version of GPTQ_loader.py. Also I had to ensure that my model name was the same as the directory that it was in, except the model needed to have an extra suffix (4bit) before the .pt extension, so models\gpt-x-alpaca-13b-native-4bit-128g\gpt-x-alpaca-13b-native-4bit-128g-4bit.pt

I did also install Protobuf in my earlier attempts, but I'm not sure whether it made a difference or not.

Using this: call python server.py --listen --listen-port 7870 --pre_layer 31 --chat --wbits 4 --groupsize 128

It is however incredibly slow, even though I am only using 2% CPU, 6.7Gb of VRAM and 8.5Gb of RAM according to Task Manager.

Increasing page file to 100G also solved my problem here. Used this line for start: call python server.py --auto-device --chat --wbits 4 --groupsize 128 --gpu-memory 5 --pre_layer 25

edemir206 avatar Apr 10 '23 19:04 edemir206

I finally got the model to load, and the webui to start. Indeed, I had to assign 100Gb of virtual memory, I also reverted to an older version of GPTQ_loader.py. Also I had to ensure that my model name was the same as the directory that it was in, except the model needed to have an extra suffix (4bit) before the .pt extension, so models\gpt-x-alpaca-13b-native-4bit-128g\gpt-x-alpaca-13b-native-4bit-128g-4bit.pt I did also install Protobuf in my earlier attempts, but I'm not sure whether it made a difference or not. Using this: call python server.py --listen --listen-port 7870 --pre_layer 31 --chat --wbits 4 --groupsize 128 It is however incredibly slow, even though I am only using 2% CPU, 6.7Gb of VRAM and 8.5Gb of RAM according to Task Manager.

Increasing page file to 100G also solved my problem here. Used this line for start: call python server.py --auto-device --chat --wbits 4 --groupsize 128 --gpu-memory 5 --pre_layer 25

How much system RAM do you have?

MancV21 avatar Apr 10 '23 19:04 MancV21

I finally got the model to load, and the webui to start. Indeed, I had to assign 100Gb of virtual memory, I also reverted to an older version of GPTQ_loader.py. Also I had to ensure that my model name was the same as the directory that it was in, except the model needed to have an extra suffix (4bit) before the .pt extension, so models\gpt-x-alpaca-13b-native-4bit-128g\gpt-x-alpaca-13b-native-4bit-128g-4bit.pt I did also install Protobuf in my earlier attempts, but I'm not sure whether it made a difference or not. Using this: call python server.py --listen --listen-port 7870 --pre_layer 31 --chat --wbits 4 --groupsize 128 It is however incredibly slow, even though I am only using 2% CPU, 6.7Gb of VRAM and 8.5Gb of RAM according to Task Manager.

Increasing page file to 100G also solved my problem here. Used this line for start: call python server.py --auto-device --chat --wbits 4 --groupsize 128 --gpu-memory 5 --pre_layer 25

How much system RAM do you have?

16GB of System Ram and RTX 2060 6GB Acer Preador Laptop

edemir206 avatar Apr 10 '23 19:04 edemir206

Ok it seems to be running pretty slow. What are the memory requirements for running vicuna 128GB ???

edemir206 avatar Apr 10 '23 19:04 edemir206

Have the same issue with RTX 2070 Super 8GB on Ubuntu linux and trying to run the vicuna-13b-GPTQ-4bit-128g and gpt4-x-alpaca-13b-native-4bit-128g models with --auto-devices --chat --wbits 4 --groupsize 128 commands. Get error (GPU 0; 7.78 GiB total capacity; 5.87 GiB already allocated; 13.69 MiB free; 6.08 GiB reserved in total by PyTorch) . It does the same with --disk command. I also tried running it with deepspeed and it locked my pc for a few minutes and the the OS decided to kill the process.

kadattack avatar Apr 11 '23 15:04 kadattack

--auto-devices does nothing for 4bit afaik. The only way to offload from GPU on 4bit is to use --pre_layer 20 or similar value. for 8GB VRAM 30 works but OOMs on longer talks. To get full 2k tokens I think you need to start with --pre_layer 15 or thereabouts. lowering this number also results in lower speed. find the balance between max context and speed.

ghost avatar Apr 11 '23 16:04 ghost

I'll drop this suggestion here too, try using the safetensors version of the model from here. Just know that you need to download absolutely everything from this repository (Well, except for .gitattributes and README.md), you can't just download the model file and hope it works. In Colab with 12 GB of system ram and 16 GB of vram, everything loads just fine running python server.py --model gpt4-x-alpaca-13b-native-4bit-128g-cuda --wbits 4 --groupsize 128, and it's fast too.

Daviljoe193 avatar Apr 12 '23 04:04 Daviljoe193

Just throwing this here too. I am getting an out-of-memory error on twin 3090 cards with 64 GB of ram and 1tb swap. There is something odd going on with how it loads stuff.

jmsether avatar Apr 13 '23 06:04 jmsether

I'll drop this suggestion here too, try using the safetensors version of the model from here. Just know that you need to download absolutely everything from this repository (Well, except for .gitattributes and README.md), you can't just download the model file and hope it works. In Colab with 12 GB of system ram and 16 GB of vram, everything loads just fine running python server.py --model gpt4-x-alpaca-13b-native-4bit-128g-cuda --wbits 4 --groupsize 128, and it's fast too.

I tried that version, and it still didn't work. Same error.

Jonseed avatar Apr 15 '23 17:04 Jonseed

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Oct 03 '23 23:10 github-actions[bot]