ctransformers
ctransformers copied to clipboard
GPU is not used even after specifying gpu_layers
I have installed ctransformers using -
pip install ctransformers[cuda]
I am trying following piece of code -
from langchain.llms import CTransformers
config = {'max_new_tokens': 512, 'repetition_penalty': 1.1, 'context_length': 8000, 'temperature':0, 'gpu_layers':50}
llm = CTransformers(model = "./codellama-7b.Q4_0.gguf", model_type = "llama", gpu_layers=50, config=config)
Here gpu_layers parameter is specified still gpu is not being used and complete load is on cpu. Can someone please point out if there is any step missing.
I am observing the same issue:
import torch
from ctransformers import AutoModelForCausalLM
local_model = 'Llama-2-7B-GGML'
llm = AutoModelForCausalLM.from_pretrained(local_model, model_file='llama-2-7b-chat.Q4_K_M.gguf', gpu_layers=50)
print("torch.cuda.memory_allocated: %fGB"%(torch.cuda.memory_allocated(0)/1024/1024/1024))
I am seeing this too using
CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
same here. still digging out...