localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

RuntimeError :out of memory

Open georgeqin96 opened this issue 1 year ago • 10 comments

RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

georgeqin96 avatar May 31 '23 14:05 georgeqin96

NVIDIA GeForce Max250, 16GB memory

georgeqin96 avatar May 31 '23 14:05 georgeqin96

I am also getting a runtime error after running run_localGPT.py:

"RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 180355072 bytes."

I have AMD Ryzen 9 5900X 12-core 3.7Ghz CPU, NVIDIA RTX 3070 GPU w/ 8GB VRAM, and 16GB RAM. Is more memory required to run localGPT?

I do have the repo saved to my external HD, whereas I noticed the LLM is saved on my C drive. Would this cause the surge in memory usage? If so, is it possible to store the models locally in the repo, similar to privateGPT?

Thanks in advance.

fjsikora avatar Jun 01 '23 04:06 fjsikora

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 16777216 bytes.

I keep getting the same response.

Hermit07 avatar Jun 01 '23 09:06 Hermit07

in ingest.py replace the model with a smaller one

# Create embeddings
# instructor-xl gives out of memory error, use a smaller one, if you still get an error use -base
# larger model can give better results but is 'useless' if you can't load it

embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large",
                                            model_kwargs={"device": device})

benninkcorien avatar Jun 02 '23 09:06 benninkcorien

It's also used in the run_localGPT.py file, so replace it there too

    embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large",
                                            model_kwargs={"device": device})

benninkcorien avatar Jun 02 '23 09:06 benninkcorien

Aaaand run_LocalGPT.py also downloads the 7B Vicuna model TheBloke/vicuna-7B-1.1-HF which is too large as well. So we'll probably need a smaller model for that one too, to prevent CUDA memory errors.

I've tried changing it to cerebras/Cerebras-GPT-2.7B which I know fits on my 3070 card, but that didn't work. Maybe someone else has time to find a smaller model that does work with this script.

benninkcorien avatar Jun 02 '23 09:06 benninkcorien

I have 4Gb and I'm running out of memory. If we can get it to run on my old NVIDIA then it will run anywhere.

msoler75 avatar Jun 04 '23 01:06 msoler75

Ok, if you run this on Linux. Try these steps. Looks like the Constitution needs 66GB by itself. FYI, this is very slow...

#Turn off all swap processes

sudo swapoff -a

#Resize the swap (from 512 MB to 100GB)

sudo dd if=/dev/zero of=/swapfile bs=1G count=100

#Make the file usable as swap

sudo mkswap /swapfile

#Activate the swap file

sudo swapon /swapfile

NQevxvEtg avatar Jun 05 '23 16:06 NQevxvEtg

It result?

zoomspoon1 avatar Jun 05 '23 17:06 zoomspoon1

Yes, it's working now, very slow.

> Question:
summarize this document in one sentense

> Answer:
 This is the Constitution of the United States, which outlines the basic structure and function of the government of the country.

NQevxvEtg avatar Jun 05 '23 18:06 NQevxvEtg