Llama-2-Open-Source-LLM-CPU-Inference icon indicating copy to clipboard operation
Llama-2-Open-Source-LLM-CPU-Inference copied to clipboard

can we use a gpu for increased speed and use of a bigger better llama2 model

Open stevedipaola opened this issue 1 year ago • 4 comments

is there instructs for that - most of us AI folk have good gpus - seems silly not to use them.

stevedipaola avatar Jul 25 '23 01:07 stevedipaola

two changes needed : def build_llm(): # Local CTransformers model llm = CTransformers(model=cfg.MODEL_BIN_PATH, model_type=cfg.MODEL_TYPE, config={'max_new_tokens': cfg.MAX_NEW_TOKENS, 'temperature': cfg.TEMPERATURE, 'gpu_layers': 24} ) return llm

uninstall ctrasnformers and re-install with CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

alior101 avatar Aug 01 '23 15:08 alior101

two changes needed : def build_llm(): # Local CTransformers model llm = CTransformers(model=cfg.MODEL_BIN_PATH, model_type=cfg.MODEL_TYPE, config={'max_new_tokens': cfg.MAX_NEW_TOKENS, 'temperature': cfg.TEMPERATURE, 'gpu_layers': 24} ) return llm

uninstall ctrasnformers and re-install with CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

Thanks @alior101, it worked! How would it integrate with poetry and pyproject.toml?

gabacode avatar Aug 10 '23 21:08 gabacode

Hello, I made the changes of llm.py

llm = CTransformers(model=cfg.MODEL_BIN_PATH, model_type=cfg.MODEL_TYPE, config={'max_new_tokens': cfg.MAX_NEW_TOKENS, 'temperature': cfg.TEMPERATURE, 'gpu_layers':24}

and reinstall ctransformers (0.2.27)

but seems still running very slow seems never use GPU
I already tested GPU is ready python -c "import torch; print(torch.cuda.is_available())" it said "True"

then I tried "python main.py "hello"" it takes 300+s to answer please advise

alexng88 avatar Nov 19 '23 06:11 alexng88

Hi @alexng88 ,

I am also facing the same issues, did you get any solutions?

VIGHNESH1521 avatar Apr 08 '24 04:04 VIGHNESH1521