langchain Issue: Very long runtimes for RetrievalQA chain with GPT4All

RetrievalQA chain with GPT4All takes an extremely long time to run (doesn't end)

I encounter massive runtimes when running a RetrievalQA chain with a locally downloaded GPT4All LLM. Unsure what's causing this.

I pass a GPT4All model (loading ggml-gpt4all-j-v1.3-groovy.bin model that I downloaded locally) to the RetrievalQA Chain. I have one source text document and use sentence-transformers from HuggingFace for embeddings (I'm using a fairly small model: all-MiniLM-L6-v2).

llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=True)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
query = "How did the pandemic affect businesses"
ans = qa.run(query)

For some reason, the running the chain on a query takes an extremely long time to run (>25 minutes).

Is this due to hardware limitations or something else? I'm able to run queries directly against the GPT4All model I downloaded locally fairly quickly (like the example shown here), which is why I'm unclear on what's causing this massive runtime.

Hardware: M1 Mac, macOS 12.1, 8 GB RAM, Python 3.10.11

Suggestion:

No response

May 20 '23 03:05 sidharthrajaram

Note: I also tried using the load_qa_chain chain instead, same outcome (hangs indefinitely)

May 20 '23 16:05 sidharthrajaram

Hi! I have exactly the same issue. Windows 10, 16GB RAM, NVidea GPU, Python 3.10

And as I see GPU is not used and I can't find way how to run GPT4All on GPU.

Jun 01 '23 20:06 PavelAgurov

Hi, I have used ConversationalRetrievalChain and load_qa_chain with gpt4all ggml-gpt4all-j-v1.3-groovy.bin model. Facing same issue. Any solution to tackle this long response time?

Jun 05 '23 05:06 poojatambe

Hi, try to change n_threads parameter in GPT4All(model=local_path, backend='gptj', verbose=True, temp=0.1, n_threads=4). For n_threads=4, took 10-15 mins to generate response.

Jun 05 '23 09:06 poojatambe

Hi, @sidharthrajaram! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is about encountering long runtimes when running a RetrievalQA chain with a locally downloaded GPT4All LLM. It seems that other users have also reported the same issue and have been seeking assistance. One user suggested changing the n_threads parameter in the GPT4All function, which reportedly reduced the response time to 10-15 minutes.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

Sep 15 '23 16:09 dosubot[bot]

This is still an issue, the number of threads a system can run depends on number of CPU available. n_threads=4 giving 10-15 minutes response time will not be expected response time for any real-world practical use case. Is increasing number of CPUs the only solution to this? Are there any other alternatives to get faster responses from GPT4All models with langchain? I believe any real-world use case can allow only a few seconds of delay to get responses.

Oct 12 '23 12:10 sujata-appiot

The issue still exists. Are there any alternatives? Thanks!

Nov 16 '23 15:11 spyroskotsakis

This is still an issue. It takes roughly 5min for my system on Windows 11.

Jan 06 '24 03:01 RNubla

langchain langchain copied to clipboard

Issue: Very long runtimes for RetrievalQA chain with GPT4All

RetrievalQA chain with GPT4All takes an extremely long time to run (doesn't end)

Suggestion:

langchain
langchain copied to clipboard