genai-stack icon indicating copy to clipboard operation
genai-stack copied to clipboard

genai-stack runs very slow when RAG is activated

Open wishatch opened this issue 1 year ago • 5 comments

genai-stack works well at reasonable speed without RAG. But, when RAG is activated it runs very slow. Any advice on how to solve this? Thx

wishatch avatar Nov 15 '23 04:11 wishatch

What LLM are you using? Is it faster if you switch to a smaller one, or OpenAI one? It's expected for it to run slower because the LLM gets fed more tokens.

oskarhane avatar Nov 15 '23 07:11 oskarhane

I am using -llama2 7b -Ubuntu 22.04 LTS -Docker Desktop Windows (wsl2 enabled) v4.25.0 -very highend PC server power -all the rest is per default configuration from github repo (no graphic card configuration)

wishatch avatar Nov 15 '23 22:11 wishatch

I am experiencing the same issue, and wonder if there is any guide available to improve/benchmark the performance.

JasonPad19 avatar Dec 03 '23 20:12 JasonPad19

Make sure you're running on GPU.

oskarhane avatar Dec 04 '23 10:12 oskarhane

Make sure you're running on GPU.

Is there a way to minimize the configuration of genai-stack, so that it runs reasonable speed without GPU (doesn't need to be super fast). GPU is expensive. It will be good if I can get familiar with this stack first before purchasing GPU card. Thx much for advice.

wishatch avatar Dec 04 '23 21:12 wishatch