InfamyStudio

Results 3 comments of InfamyStudio

Did you make any headway on improving speeds @chenle02 - this is also what we want to achieve!

> You can get faster inference using GPU offloading. llama.cpp now supports a mix of using CPU and GPU. Even langchain mentions it. That will significantly speed up inference. Are...