langchain RetrievalQA costs long time to get the answer

I use vectore_db Chroma and langchain RetrievalQA to build my docs bot, but every question costs about 16 ~ 17 seconds. someboby has any ideas? Thanks

here is my code

embeddings = OpenAIEmbeddings()
vector_store = Chroma(persist_directory="docs_db", embedding_function=embeddings)

qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    chain_type="stuff",
    retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 1}),
    return_source_documents=True,
    verbose=True,
)

result = qa({"query": keyword})

I searched in langchain's docs but find no way. and I try count every step


%%time
docs = vector_store.similarity_search(keyword, k=1)
db costs: 2.204489231109619s

%%time
chain = load_qa_with_sources_chain(ChatOpenAI(temperature=0), chain_type="stuff")
llm costs: 5.171542167663574s

Apr 20 '23 04:04 hifiveszu

I had a similar issue The biggest delay was in response from OpenAI API coz it waits the whole answer (whole generation) You could try to use stream=True to get first results faster

Apr 20 '23 07:04 alifanov

I had a similar issue The biggest delay was in response from OpenAI API coz it waits the whole answer (whole generation) You could try to use stream=True to get first results faster

Thanks for reply ：） I have a self host chatgpt-web.Each time I send my message directly to openai, the server will reply almost within one second.

So I don't know the details run in langchain's function when run in the following code

result = llm_chain.run(context=context, question=question)

Is there any differences from directly request openai's API?

Now I will prefer to request openai's API ...

Apr 20 '23 10:04 hifiveszu

I've tried to use OpenAI API from scratch - it was also slow for big texts in response

Apr 20 '23 10:04 alifanov

I've tried to use OpenAI API from scratch - it was also slow for big texts in response

use stream=True to call openai and response stream data out is the right answer : ) user can see the answer directly without waiting the whole generation

Apr 21 '23 06:04 hifiveszu

Where should the "stream=True" option be put? I can't find it in the langchain documentation

May 09 '23 17:05 sebacarabajal

https://python.langchain.com/en/latest/reference/modules/chat_models.html?highlight=streaming#langchain.chat_models.ChatOpenAI.streaming

May 09 '23 20:05 alifanov

Where should the "stream=True" option be put? I can't find it in the langchain documentation

ChatOpenAI(temperature=0,streaming=True)

Jun 01 '23 10:06 xiongwn

Hi, @hifiveszu! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you were experiencing long retrieval times when using the RetrievalQA module with Chroma and langchain. Another user suggested using stream=True to get faster results from the OpenAI API, and it seems like this solution has been confirmed by another user as well.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!

Sep 17 '23 17:09 dosubot[bot]

langchain langchain copied to clipboard

RetrievalQA costs long time to get the answer

langchain
langchain copied to clipboard