Llama-2-Open-Source-LLM-CPU-Inference
Llama-2-Open-Source-LLM-CPU-Inference copied to clipboard
can we use a gpu for increased speed and use of a bigger better llama2 model
is there instructs for that - most of us AI folk have good gpus - seems silly not to use them.
two changes needed :
def build_llm(): # Local CTransformers model llm = CTransformers(model=cfg.MODEL_BIN_PATH, model_type=cfg.MODEL_TYPE, config={'max_new_tokens': cfg.MAX_NEW_TOKENS, 'temperature': cfg.TEMPERATURE, 'gpu_layers': 24} ) return llm
uninstall ctrasnformers and re-install with CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers
two changes needed :
def build_llm(): # Local CTransformers model llm = CTransformers(model=cfg.MODEL_BIN_PATH, model_type=cfg.MODEL_TYPE, config={'max_new_tokens': cfg.MAX_NEW_TOKENS, 'temperature': cfg.TEMPERATURE, 'gpu_layers': 24} ) return llm
uninstall ctrasnformers and re-install with CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers
Thanks @alior101, it worked! How would it integrate with poetry and pyproject.toml?
Hello, I made the changes of llm.py
llm = CTransformers(model=cfg.MODEL_BIN_PATH, model_type=cfg.MODEL_TYPE, config={'max_new_tokens': cfg.MAX_NEW_TOKENS, 'temperature': cfg.TEMPERATURE, 'gpu_layers':24}
and reinstall ctransformers (0.2.27)
but seems still running very slow seems never use GPU
I already tested GPU is ready
python -c "import torch; print(torch.cuda.is_available())"
it said "True"
then I tried "python main.py "hello"" it takes 300+s to answer please advise
Hi @alexng88 ,
I am also facing the same issues, did you get any solutions?