localGPT Program is running on CPU while set on GPU

Running on a threadripper + RTX A6000 with 48gb of VRAM.

I did the installation but i went through some issues.

First one: Couldnt ingest because i got the error: Torch not compiled with CUDA enabled
- i tried to fixed it by installing pytorch via conda but couldnt get conda installed on my computer (if someone can help me), so i went with this solution: pip uninstall torch torchvision and then reinstall with pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
After that the program ingested and launched. I was able to query my document.

BUT, the program was working through my CPU only. I didnt touched anything in the code neither specified nothing with python run_localGPT.py.

Does someone know where this issue come from ?

May 30 '23 12:05 vaylonn

I'm also interested in this. I can't get it on the GPU for some reason.

May 30 '23 15:05 SpeedOfSpin

in localGPT/run_localGPT.py

Add import torch and from transformers import AutoTokenizer, AutoModelForCausalLM at the beginning
In load_model() function, change LlamaTokenizer to AutoTokenizer
Change LlamaForCausalLM to AutoModelForCausalLM
Add the following options to AutoModelForCausalLM.from_pretrained() function call:
1. device_map='auto'
2. torch_dtype=torch.float16

Tested on model TheBloke/Wizard-Vicuna-13B-Uncensored-HF · Hugging Face

May 31 '23 06:05 ttimasdf

will test it later this day, i take you guys updated !

May 31 '23 06:05 vaylonn

would you mind to post the functions. I try to do that and it returns error, with me...

Jun 01 '23 00:06 lelapin123

This is what I ended up doing

gpu = True

def load_model():

    model_id = "TheBloke/vicuna-7B-1.1-HF"
    # model_id = "mayaeary/pygmalion-6b_dev-4bit-128g"
    # model_id = "TheBloke/wizardLM-7B-GPTQ"

    if gpu:
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForCausalLM.from_pretrained(model_id,
                                            device_map='auto',
                                            torch_dtype=torch.float16,
                                            )
    else:
        tokenizer = LlamaTokenizer.from_pretrained(model_id)
        model = LlamaForCausalLM.from_pretrained(model_id)



    pipe = pipeline(
        "text-generation",
        model=model, 
        tokenizer=tokenizer, 
        max_length=2048,
        temperature=0,
        top_p=0.95,
        repetition_penalty=1.15
    )

    local_llm = HuggingFacePipeline(pipeline=pipe)

    return local_llm

You will probably need a 24GB GPU to run that model though

Jun 01 '23 08:06 SpeedOfSpin

I solved similar/same issue by reinstalling torch:

pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade --force-reinstall

Source: adapted from https://stackoverflow.com/a/76144354/885761

Jun 03 '23 17:06 rolandinsh

Non of these solutions work for me, it still running on cpu. :(

Edit: sorry i was a noob, the model i ran doesnt work on gpu. So i changed it to a different model and now my gpu is running at 100% from both anaconda and wsl2.

Aug 06 '23 05:08 khangeqkai

localGPT localGPT copied to clipboard

Program is running on CPU while set on GPU

localGPT
localGPT copied to clipboard