StephenDWright comments

Results 15 comments of


                                            StephenDWright

feat: Enable GPU acceleration

First of all, great contribution, was looking out for this and was excited to see someone put it together so quickly. Unfortunately I haven't got it to use my GPU....

feat: Enable GPU acceleration

Yes I am, currently a 12 GB 3060. I know you had to ask because there will always be someone who will try to run it on an Radeon Graphics...

feat: Enable GPU acceleration

@maozdemir I see Blas = 0. I am assuming you are referring to that. This is the output to the terminal. Thanks for taking the time to troubleshoot btw. Using...

Before I do that, I did it again yesterday, this was some of the output while building after running this command: $Env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"; $Env:FORCE_CMAKE=1; py ./setup.py install I took this output...

feat: Enable GPU acceleration

@johnbrisbin Thank you for the feedback. I am also trying to run it in VS code, in a venv. I have deleted the folder and environment and cloned so many...

feat: Enable GPU acceleration

Really brilliant, even though I am about to give up on getting GPU to work for now after an evening of trying, it is still a great addition. 👍👍

VERY BIG performance improvement and beautiful features

@DanielusG I added n_batch=2000 and the performance increase was phenomenal! You are right and I am blown away. The prompt eval time moved from This: llama_print_timings: prompt eval time =...

VERY BIG performance improvement and beautiful features

> > If I am not mistaken 87% increase in speed. It moved from 24 seconds to 3 Seconds > > Yes, you are right, but expressed like this doesn't...

ggml-old-vic13b-q5_1.bin not supported

Just for comparison, I am using wizard Vicuna 13GB ggml but I am using it with GPU implementation where some of the work gets off loaded. Answers take about 4-5...

ggml-old-vic13b-q5_1.bin not supported

> > Just for comparison, I am using wizard Vicuna 13GB ggml but I am using it with GPU implementation where some of the work gets off loaded. Answers take...