private-gpt
private-gpt copied to clipboard
Any way can get GPU work?
Can anyone suggest how to make GPU work with this project?
Chances are, it's already partially using the GPU. As it is now, it's a script linking together LLaMa.cpp emeddings, Chroma vector DB, and GPT4All. GPT4All might be using PyTorch with GPU, Chroma is probably already heavily CPU parallelized, and LLaMa.cpp runs only on the CPU.
It's also worth noting that two LLMs are used with different inference implementations, meaning you may have to load the model twice.
I watched my GPU usage and it was not touched.
this mean that this work only with CPU?
I currently want to try this
Also can give some info on the Readme about the requirements of hardware.
No, LlamaCpp was designed to take only CPU resources. For GPU you'd have to use the native Llama model from facebook.
I can get it work in Ubuntu 22.04 installing llama-cpp-python with cuBLAS:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.48
If installation fails because it doesn't find CUDA, it's probably because you have to include CUDA install path to PATH environment variable:
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
Anyways, it only uses lesst than 1 GB of the VRAM on a RTX 2060 with 6 GB, so I don't know if something is still missing.
Aren't you just emulating the CPU? Idk if there's even working port for GPU support
Aren't you just emulating the CPU? Idk if there's even working port for GPU support
It shouldn't. The llama.cpp library can perform BLAS acceleration using the CUDA cores of the Nvidia GPU through cuBLAS. I expect llama-cpp-python to do so as well when installing it with cuBLAS.
Any fast way to verify if the GPU is being used other than running nvidia-smi
or nvtop
?
Nvm my collaborator found a way see
If anyone can't still figure this out, I explained how I got it to work in detail here (issue #217)