genai-stack
genai-stack copied to clipboard
genai-stack runs very slow when RAG is activated
genai-stack works well at reasonable speed without RAG. But, when RAG is activated it runs very slow. Any advice on how to solve this? Thx
What LLM are you using? Is it faster if you switch to a smaller one, or OpenAI one? It's expected for it to run slower because the LLM gets fed more tokens.
I am using -llama2 7b -Ubuntu 22.04 LTS -Docker Desktop Windows (wsl2 enabled) v4.25.0 -very highend PC server power -all the rest is per default configuration from github repo (no graphic card configuration)
I am experiencing the same issue, and wonder if there is any guide available to improve/benchmark the performance.
Make sure you're running on GPU.
Make sure you're running on GPU.
Is there a way to minimize the configuration of genai-stack, so that it runs reasonable speed without GPU (doesn't need to be super fast). GPU is expensive. It will be good if I can get familiar with this stack first before purchasing GPU card. Thx much for advice.