InfamyStudio
Results
3
comments of
InfamyStudio
bump to this, still experiencing this!
Did you make any headway on improving speeds @chenle02 - this is also what we want to achieve!
> You can get faster inference using GPU offloading. llama.cpp now supports a mix of using CPU and GPU. Even langchain mentions it. That will significantly speed up inference. Are...