private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Any way can get GPU work?

Open rexzhang2023 opened this issue 1 year ago • 9 comments

Can anyone suggest how to make GPU work with this project?

rexzhang2023 avatar May 12 '23 00:05 rexzhang2023

Chances are, it's already partially using the GPU. As it is now, it's a script linking together LLaMa.cpp emeddings, Chroma vector DB, and GPT4All. GPT4All might be using PyTorch with GPU, Chroma is probably already heavily CPU parallelized, and LLaMa.cpp runs only on the CPU.

It's also worth noting that two LLMs are used with different inference implementations, meaning you may have to load the model twice.

walking-octopus avatar May 12 '23 08:05 walking-octopus

I watched my GPU usage and it was not touched.

mmike87 avatar May 12 '23 13:05 mmike87

this mean that this work only with CPU?

I currently want to try this

Also can give some info on the Readme about the requirements of hardware.

pabl-o-ce avatar May 13 '23 02:05 pabl-o-ce

No, LlamaCpp was designed to take only CPU resources. For GPU you'd have to use the native Llama model from facebook.

su77ungr avatar May 14 '23 06:05 su77ungr

I can get it work in Ubuntu 22.04 installing llama-cpp-python with cuBLAS: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.48

If installation fails because it doesn't find CUDA, it's probably because you have to include CUDA install path to PATH environment variable: export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}

Anyways, it only uses lesst than 1 GB of the VRAM on a RTX 2060 with 6 GB, so I don't know if something is still missing.

iker-lluvia avatar May 14 '23 15:05 iker-lluvia

Aren't you just emulating the CPU? Idk if there's even working port for GPU support

su77ungr avatar May 14 '23 20:05 su77ungr

Aren't you just emulating the CPU? Idk if there's even working port for GPU support

It shouldn't. The llama.cpp library can perform BLAS acceleration using the CUDA cores of the Nvidia GPU through cuBLAS. I expect llama-cpp-python to do so as well when installing it with cuBLAS. Any fast way to verify if the GPU is being used other than running nvidia-smi or nvtop?

iker-lluvia avatar May 15 '23 07:05 iker-lluvia

Nvm my collaborator found a way see

su77ungr avatar May 15 '23 09:05 su77ungr

If anyone can't still figure this out, I explained how I got it to work in detail here (issue #217)

shondle avatar May 22 '23 14:05 shondle