GenAIExamples RAG is slow in ChatQnA demo on Xeon

RAG is slow in ChatQnA demo on Xeon

Open NeoZhangJianyu opened this issue 6 months ago • 3 comments

I setup the demo based on ChatQnA (TGI) on Xeon (GNR). Try RAG by the UI. After upload the PDF file (2-5M), I search a question. It will take 10-15s.

When update a text file with 3 lines, it's 2-3s.

Customer find the slow issue on embedding stage.

Aug 13 '24 09:08 NeoZhangJianyu