langchain
langchain copied to clipboard
Issue: Very long runtimes for RetrievalQA chain with GPT4All
RetrievalQA chain with GPT4All takes an extremely long time to run (doesn't end)
I encounter massive runtimes when running a RetrievalQA chain with a locally downloaded GPT4All LLM. Unsure what's causing this.
I pass a GPT4All model (loading ggml-gpt4all-j-v1.3-groovy.bin model that I downloaded locally) to the RetrievalQA Chain. I have one source text document and use sentence-transformers from HuggingFace for embeddings (I'm using a fairly small model: all-MiniLM-L6-v2).
llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=True)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
query = "How did the pandemic affect businesses"
ans = qa.run(query)
For some reason, the running the chain on a query takes an extremely long time to run (>25 minutes).
Is this due to hardware limitations or something else? I'm able to run queries directly against the GPT4All model I downloaded locally fairly quickly (like the example shown here), which is why I'm unclear on what's causing this massive runtime.
Hardware: M1 Mac, macOS 12.1, 8 GB RAM, Python 3.10.11
Suggestion:
No response
Note: I also tried using the load_qa_chain
chain instead, same outcome (hangs indefinitely)
Hi! I have exactly the same issue. Windows 10, 16GB RAM, NVidea GPU, Python 3.10
And as I see GPU is not used and I can't find way how to run GPT4All on GPU.
Hi, I have used ConversationalRetrievalChain and load_qa_chain with gpt4all ggml-gpt4all-j-v1.3-groovy.bin model. Facing same issue. Any solution to tackle this long response time?
Hi, try to change n_threads parameter in GPT4All(model=local_path, backend='gptj', verbose=True, temp=0.1, n_threads=4). For n_threads=4, took 10-15 mins to generate response.
Hi, @sidharthrajaram! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue you reported is about encountering long runtimes when running a RetrievalQA chain with a locally downloaded GPT4All LLM. It seems that other users have also reported the same issue and have been seeking assistance. One user suggested changing the n_threads
parameter in the GPT4All function, which reportedly reduced the response time to 10-15 minutes.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!
This is still an issue, the number of threads a system can run depends on number of CPU available. n_threads=4 giving 10-15 minutes response time will not be expected response time for any real-world practical use case. Is increasing number of CPUs the only solution to this? Are there any other alternatives to get faster responses from GPT4All models with langchain? I believe any real-world use case can allow only a few seconds of delay to get responses.
The issue still exists. Are there any alternatives? Thanks!
This is still an issue. It takes roughly 5min for my system on Windows 11.