localGPT
localGPT copied to clipboard
RuntimeError :out of memory
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
NVIDIA GeForce Max250, 16GB memory
I am also getting a runtime error after running run_localGPT.py:
"RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 180355072 bytes."
I have AMD Ryzen 9 5900X 12-core 3.7Ghz CPU, NVIDIA RTX 3070 GPU w/ 8GB VRAM, and 16GB RAM. Is more memory required to run localGPT?
I do have the repo saved to my external HD, whereas I noticed the LLM is saved on my C drive. Would this cause the surge in memory usage? If so, is it possible to store the models locally in the repo, similar to privateGPT?
Thanks in advance.
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 16777216 bytes.
I keep getting the same response.
in ingest.py replace the model with a smaller one
# Create embeddings
# instructor-xl gives out of memory error, use a smaller one, if you still get an error use -base
# larger model can give better results but is 'useless' if you can't load it
embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large",
model_kwargs={"device": device})
It's also used in the run_localGPT.py file, so replace it there too
embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-large",
model_kwargs={"device": device})
Aaaand run_LocalGPT.py also downloads the 7B Vicuna model TheBloke/vicuna-7B-1.1-HF which is too large as well. So we'll probably need a smaller model for that one too, to prevent CUDA memory errors.
I've tried changing it to cerebras/Cerebras-GPT-2.7B which I know fits on my 3070 card, but that didn't work. Maybe someone else has time to find a smaller model that does work with this script.
I have 4Gb and I'm running out of memory. If we can get it to run on my old NVIDIA then it will run anywhere.
Ok, if you run this on Linux. Try these steps. Looks like the Constitution needs 66GB by itself. FYI, this is very slow...
#Turn off all swap processes
sudo swapoff -a
#Resize the swap (from 512 MB to 100GB)
sudo dd if=/dev/zero of=/swapfile bs=1G count=100
#Make the file usable as swap
sudo mkswap /swapfile
#Activate the swap file
sudo swapon /swapfile
It result?
Yes, it's working now, very slow.
> Question:
summarize this document in one sentense
> Answer:
This is the Constitution of the United States, which outlines the basic structure and function of the government of the country.