GenAIExamples
GenAIExamples copied to clipboard
RAG is slow in ChatQnA demo on Xeon
I setup the demo based on ChatQnA (TGI) on Xeon (GNR). Try RAG by the UI. After upload the PDF file (2-5M), I search a question. It will take 10-15s.
When update a text file with 3 lines, it's 2-3s.
Customer find the slow issue on embedding stage.