localGPT
localGPT copied to clipboard
Program is running on CPU while set on GPU
Running on a threadripper + RTX A6000 with 48gb of VRAM.
I did the installation but i went through some issues.
-
First one: Couldnt ingest because i got the error:
Torch not compiled with CUDA enabled
- i tried to fixed it by installing pytorch via conda but couldnt get conda installed on my computer (if someone can help me), so i went with this solution:
pip uninstall torch torchvision
and then reinstall withpip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- i tried to fixed it by installing pytorch via conda but couldnt get conda installed on my computer (if someone can help me), so i went with this solution:
-
After that the program ingested and launched. I was able to query my document.
BUT, the program was working through my CPU only. I didnt touched anything in the code neither specified nothing with python run_localGPT.py
.
Does someone know where this issue come from ?
I'm also interested in this. I can't get it on the GPU for some reason.
in localGPT/run_localGPT.py
-
Add
import torch
andfrom transformers import AutoTokenizer, AutoModelForCausalLM
at the beginning -
In
load_model()
function, changeLlamaTokenizer
toAutoTokenizer
-
Change
LlamaForCausalLM
toAutoModelForCausalLM
-
Add the following options to
AutoModelForCausalLM.from_pretrained()
function call:-
device_map='auto'
-
torch_dtype=torch.float16
-
Tested on model TheBloke/Wizard-Vicuna-13B-Uncensored-HF · Hugging Face
will test it later this day, i take you guys updated !
would you mind to post the functions. I try to do that and it returns error, with me...
This is what I ended up doing
gpu = True
def load_model():
model_id = "TheBloke/vicuna-7B-1.1-HF"
# model_id = "mayaeary/pygmalion-6b_dev-4bit-128g"
# model_id = "TheBloke/wizardLM-7B-GPTQ"
if gpu:
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,
device_map='auto',
torch_dtype=torch.float16,
)
else:
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=2048,
temperature=0,
top_p=0.95,
repetition_penalty=1.15
)
local_llm = HuggingFacePipeline(pipeline=pipe)
return local_llm
You will probably need a 24GB GPU to run that model though
I solved similar/same issue by reinstalling torch:
pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade --force-reinstall
Source: adapted from https://stackoverflow.com/a/76144354/885761
Non of these solutions work for me, it still running on cpu. :(
Edit: sorry i was a noob, the model i ran doesnt work on gpu. So i changed it to a different model and now my gpu is running at 100% from both anaconda and wsl2.