frederickrohn comments

Repositories
Issues
Comments

Results 2 comments of


                                            frederickrohn

llama-cpp-python not using GPU on m1

Hi, apologies for the basic question, I'm still a beginner with llama-cpp-python. Downloaded a 7b quantized small model directly from the website and put it in the working directory, loaded...

llama-cpp-python not using GPU on m1

5 Tokens per second is pretty fast, that's a much better performance than what I was getting on the 8gb M1 (about 20 words so probably around 40 tokens in...