StephenDWright comments

Results 15 comments of


                                            StephenDWright

ggml-old-vic13b-q5_1.bin not supported

> @StephenDWright I also tried to enable GPU acceleration, but it was not successful. Can you give me more details? Thank you. I followed the instructs in the thread. in...

ggml-old-vic13b-q5_1.bin not supported

> @StephenDWright I didn't encounter any errors, but the GPU just doesn't work, and I feel like my CUDA toolkit hasn't been fixed properly :( > What kind of GPU...

ggml-old-vic13b-q5_1.bin not supported

> @StephenDWright thanks for the follow up. I was wondering what type of graphic card would I need to make this somehow usable. I don't think my GPU would handle...

Speed issues on prompt - what can be done to improve this?

You can get faster inference using GPU offloading. llama.cpp now supports a mix of using CPU and GPU. Even langchain mentions it. That will significantly speed up inference.

Speed issues on prompt - what can be done to improve this?

> > You can get faster inference using GPU offloading. llama.cpp now supports a mix of using CPU and GPU. Even langchain mentions it. That will significantly speed up inference....