langchain icon indicating copy to clipboard operation
langchain copied to clipboard

RetrievalQA costs long time to get the answer

Open hifiveszu opened this issue 1 year ago • 5 comments

I use vectore_db Chroma and langchain RetrievalQA to build my docs bot, but every question costs about 16 ~ 17 seconds. someboby has any ideas? Thanks

here is my code

embeddings = OpenAIEmbeddings()
vector_store = Chroma(persist_directory="docs_db", embedding_function=embeddings)

qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    chain_type="stuff",
    retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 1}),
    return_source_documents=True,
    verbose=True,
)

result = qa({"query": keyword})

I searched in langchain's docs but find no way. and I try count every step


%%time
docs = vector_store.similarity_search(keyword, k=1)
db costs: 2.204489231109619s

%%time
chain = load_qa_with_sources_chain(ChatOpenAI(temperature=0), chain_type="stuff")
llm costs: 5.171542167663574s 

hifiveszu avatar Apr 20 '23 04:04 hifiveszu

I had a similar issue The biggest delay was in response from OpenAI API coz it waits the whole answer (whole generation) You could try to use stream=True to get first results faster

alifanov avatar Apr 20 '23 07:04 alifanov

I had a similar issue The biggest delay was in response from OpenAI API coz it waits the whole answer (whole generation) You could try to use stream=True to get first results faster

Thanks for reply :) I have a self host chatgpt-web.Each time I send my message directly to openai, the server will reply almost within one second.

So I don't know the details run in langchain's function when run in the following code

result = llm_chain.run(context=context, question=question)

Is there any differences from directly request openai's API?

Now I will prefer to request openai's API ...

hifiveszu avatar Apr 20 '23 10:04 hifiveszu

I've tried to use OpenAI API from scratch - it was also slow for big texts in response

alifanov avatar Apr 20 '23 10:04 alifanov

I've tried to use OpenAI API from scratch - it was also slow for big texts in response

use stream=True to call openai and response stream data out is the right answer : ) user can see the answer directly without waiting the whole generation

hifiveszu avatar Apr 21 '23 06:04 hifiveszu

Where should the "stream=True" option be put? I can't find it in the langchain documentation

sebacarabajal avatar May 09 '23 17:05 sebacarabajal

https://python.langchain.com/en/latest/reference/modules/chat_models.html?highlight=streaming#langchain.chat_models.ChatOpenAI.streaming

alifanov avatar May 09 '23 20:05 alifanov

Where should the "stream=True" option be put? I can't find it in the langchain documentation

ChatOpenAI(temperature=0,streaming=True)

xiongwn avatar Jun 01 '23 10:06 xiongwn

Hi, @hifiveszu! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you were experiencing long retrieval times when using the RetrievalQA module with Chroma and langchain. Another user suggested using stream=True to get faster results from the OpenAI API, and it seems like this solution has been confirmed by another user as well.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!

dosubot[bot] avatar Sep 17 '23 17:09 dosubot[bot]