localllm icon indicating copy to clipboard operation
localllm copied to clipboard

Option to enable the GPU

Open kk2491 opened this issue 1 year ago • 1 comments

Hi All,

First of all thank you for this excellent tool which makes it very to easy to run the LLM models without any hassle.

I am aware that the main purpose of the localllm is to eliminate the dependency on GPUs and run the models using CPU. However I wanted to know if there is an option to offload the layers to the GPU.

Machine : Compute engine created in GCP
OS : Ubuntu 22.04 LTS GPU : Tesla T4

Steps I followed thus far is as given below:

  1. Installed the Nvidia driver in the compute engine. nvidia-smi output as given below
    image
  2. Assuming localllm does not directly provide an option to enable GPU ( I may be wrong here), I clone the llama-cpp-python repository, and updated the n_gpu_layers to 4 in llama_cpp/server/settings.py.
  3. Built the package by running pip install -e ., complete step is given here
  4. Killed the localllm and started again.

However I still see that the GPUs are not being utilized.

Are the above steps correct or did I miss anything here?

Thank you,
KK

kk2491 avatar Mar 01 '24 06:03 kk2491

Thanks for the question @kk2491, I'll take a look and see what I can find!

bobcatfish avatar Mar 05 '24 19:03 bobcatfish