langchain
langchain copied to clipboard
Slow response time with `ConversationalRetrievalQAChain`
Issue you'd like to raise.
Hello,
I am facing slow response times (25 - 30 second) per question with ConversationalRetrievalQAChain and pinecone.
const chain = ConversationalRetrievalQAChain.fromLLM(
this.llm,
vectorStore.asRetriever(),
);
const res = await chain.call({ question, chat_history: [''] });
95% of that time is spent from the time the chain.call is executed. I have tried both gpt-3.5-turbo and gpt-4 models and I face similar response times.
I've also tried to turn on streaming, and I can see that for gtp-3.5-turbo there is nothing being streamed on the first 20 seconds or so. And once it starts streaming, it is faster compared to gpt-4. But, gpt-4 takes much less time to start streaming, but then it is slower to complete the answer.
Any help would be appreciated, thank you!
Sometimes I found the same issue as well. This PR #5066 might help a bit in some cases.
Hi, @gzimh! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you are experiencing slow response times when using ConversationalRetrievalQAChain and pinecone. You have already tried different models and streaming, but the issue still persists. jpzhangvincent suggested that PR #5066 might help in some cases.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!
@dosubot let's reopen this issue as the PR https://github.com/langchain-ai/langchain/pull/5066 aren't merge yet.
The problem is not solved