langchain
langchain copied to clipboard
RetrievalQA costs long time to get the answer
I use vectore_db Chroma
and langchain RetrievalQA
to build my docs bot, but every question costs about 16 ~ 17 seconds.
someboby has any ideas? Thanks
here is my code
embeddings = OpenAIEmbeddings()
vector_store = Chroma(persist_directory="docs_db", embedding_function=embeddings)
qa = RetrievalQA.from_chain_type(
llm=ChatOpenAI(),
chain_type="stuff",
retriever=vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 1}),
return_source_documents=True,
verbose=True,
)
result = qa({"query": keyword})
I searched in langchain's docs but find no way. and I try count every step
%%time
docs = vector_store.similarity_search(keyword, k=1)
db costs: 2.204489231109619s
%%time
chain = load_qa_with_sources_chain(ChatOpenAI(temperature=0), chain_type="stuff")
llm costs: 5.171542167663574s
I had a similar issue The biggest delay was in response from OpenAI API coz it waits the whole answer (whole generation) You could try to use stream=True to get first results faster
I had a similar issue The biggest delay was in response from OpenAI API coz it waits the whole answer (whole generation) You could try to use stream=True to get first results faster
Thanks for reply :) I have a self host chatgpt-web.Each time I send my message directly to openai, the server will reply almost within one second.
So I don't know the details run in langchain's function when run in the following code
result = llm_chain.run(context=context, question=question)
Is there any differences from directly request openai's API?
Now I will prefer to request openai's API ...
I've tried to use OpenAI API from scratch - it was also slow for big texts in response
I've tried to use OpenAI API from scratch - it was also slow for big texts in response
use stream=True
to call openai and response stream data out is the right answer : )
user can see the answer directly without waiting the whole generation
Where should the "stream=True" option be put? I can't find it in the langchain documentation
https://python.langchain.com/en/latest/reference/modules/chat_models.html?highlight=streaming#langchain.chat_models.ChatOpenAI.streaming
Where should the "stream=True" option be put? I can't find it in the langchain documentation
ChatOpenAI(temperature=0,streaming=True)
Hi, @hifiveszu! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you were experiencing long retrieval times when using the RetrievalQA
module with Chroma
and langchain
. Another user suggested using stream=True
to get faster results from the OpenAI API, and it seems like this solution has been confirmed by another user as well.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository!