Model not loading on GPU

Open AndreaLombax opened this issue 2 years ago • 1 comments

Hi, I'm having trouble with Mistral because the model is not loading on GPU but it is only running on CPU.

That's the code:

from ctransformers import AutoModelForCausalLM, Config, hub

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.1-GGUF", 
                                           model_file="mistral-7b-instruct-v0.1.Q5_K_M.gguf",
                                           config=hub.AutoConfig(config),
                                           model_type="mistral", gpu_layers=200)

print(llm("trying blabla"))

Versions:

CUDA: 12.2
libcudart12
nvidia drivers: 535.129.03
ctransformers: 0.2.27
transformers: 4.34.0
torch: 2.1.1
python: 3.10.13

I have two NVIDIA A16 16GB, and the load is only 4mb for each.

Nov 18 '23 10:11 AndreaLombax

I ran into this. I can recreate it on windows with a 1660ti and python 3.11 full install from python.org by running the following:

# Create a clean venv
pip install ctransformers
pip install ctransformers[cuda]

The second installation will bring in the nvidia dependencies but running a very similar code snippet the model never actually loads into GPU memory in task manager.

Does not appear tied to models or model installation method (I was scratching my head with the same model from op pulled manuall, the dolphin mistral 2.1 gguf model pulled manually, and several variations of llama2 pulled automatically using the hugging face pattern the author references in the README.md).

Dec 17 '23 19:12 ryanmunz