paper-qa
paper-qa copied to clipboard
Using a local model has issues
Hi @whitead ,
I used the following code block and used a quantized LLM from here: https://huggingface.co/Pi3141
The model I used: https://huggingface.co/Pi3141/gpt4-x-alpaca-native-13B-ggml
import paperscraper
from paperqa import Docs
from langchain.embeddings import LlamaCppEmbeddings
from langchain.llms import LlamaCpp
llm = LlamaCpp(model_path="ggml-model-q4_1.bin")
embeddings = LlamaCppEmbeddings(model_path="ggml-model-q4_1.bin")
docs = Docs(llm=llm, embeddings=embeddings)
keyword_search = 'bispecific antibody manufacture'
papers = paperscraper.search_papers(keyword_search, limit=2)
for path,data in papers.items():
try:
docs.add(path,chunk_chars=500)
except ValueError as e:
print('Could not read', path, e)
answer = docs.query("What manufacturing challenges are unique to bispecific antibodies?")
print(answer)
- This did run for nearly 1.5 hours and crashed.
Error log:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Input In [4], in <cell line: 12>()
10 print('Could not read', path, e)
11 print("Im here")
---> 12 answer = docs.query("What manufacturing challenges are unique to bispecific antibodies?")
13 print(answer)
14 end_time = time.time()
File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:374, in Docs.query(self, query, k, max_sources, length_prompt, marginal_relevance, answer, key_filter)
372 loop = asyncio.new_event_loop()
373 asyncio.set_event_loop(loop)
--> 374 return loop.run_until_complete(
375 self.aquery(
376 query,
377 k=k,
378 max_sources=max_sources,
379 length_prompt=length_prompt,
380 marginal_relevance=marginal_relevance,
381 answer=answer,
382 key_filter=key_filter,
383 )
384 )
File ~/.local/lib/python3.10/site-packages/nest_asyncio.py:89, in _patch_loop.<locals>.run_until_complete(self, future)
86 if not f.done():
87 raise RuntimeError(
88 'Event loop stopped before Future completed.')
---> 89 return f.result()
File /usr/lib/python3.10/asyncio/futures.py:201, in Future.result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception.with_traceback(self._exception_tb)
202 return self._result
File /usr/lib/python3.10/asyncio/tasks.py:234, in Task.__step(***failed resolving arguments***)
232 result = coro.send(None)
233 else:
--> 234 result = coro.throw(exc)
235 except StopIteration as exc:
236 if self._must_cancel:
237 # Task is cancelled right before coro stops.
File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:406, in Docs.aquery(self, query, k, max_sources, length_prompt, marginal_relevance, answer, key_filter)
404 answer.tokens += cb.total_tokens
405 answer.cost += cb.total_cost
--> 406 answer = await self.aget_evidence(
407 answer,
408 k=k,
409 max_sources=max_sources,
410 marginal_relevance=marginal_relevance,
411 key_filter=keys if key_filter else None,
412 )
413 context_str, contexts = answer.context, answer.contexts
414 bib = dict()
File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:311, in Docs.aget_evidence(self, answer, k, max_sources, marginal_relevance, key_filter)
308 return None
310 with get_openai_callback() as cb:
--> 311 contexts = await asyncio.gather(*[process(doc) for doc in docs])
312 answer.tokens += cb.total_tokens
313 answer.cost += cb.total_cost
File /usr/lib/python3.10/asyncio/tasks.py:304, in Task.__wakeup(self, future)
302 def __wakeup(self, future):
303 try:
--> 304 future.result()
305 except BaseException as exc:
306 # This may also be a cancellation.
307 self.__step(exc)
File /usr/lib/python3.10/asyncio/tasks.py:232, in Task.__step(***failed resolving arguments***)
228 try:
229 if exc is None:
230 # We use the `send` method directly, because coroutines
231 # don't have `__iter__` and `__next__` methods.
--> 232 result = coro.send(None)
233 else:
234 result = coro.throw(exc)
File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:299, in Docs.aget_evidence.<locals>.process(doc)
294 if doc.metadata["key"] in [c.key for c in answer.contexts]:
295 return None
296 c = Context(
297 key=doc.metadata["key"],
298 citation=doc.metadata["citation"],
--> 299 context=await self.summary_chain.arun(
300 question=answer.question,
301 context_str=doc.page_content,
302 citation=doc.metadata["citation"],
303 ),
304 text=doc.page_content,
305 )
306 if "Not applicable" not in c.context:
307 return c
File ~/.local/lib/python3.10/site-packages/langchain/chains/base.py:237, in Chain.arun(self, *args, **kwargs)
234 return (await self.acall(args[0]))[self.output_keys[0]]
236 if kwargs and not args:
--> 237 return (await self.acall(kwargs))[self.output_keys[0]]
239 raise ValueError(
240 f"`run` supported with either positional arguments or keyword arguments"
241 f" but not both. Got args: {args} and kwargs: {kwargs}."
242 )
File ~/.local/lib/python3.10/site-packages/langchain/chains/base.py:154, in Chain.acall(self, inputs, return_only_outputs)
152 else:
153 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 154 raise e
155 if self.callback_manager.is_async:
156 await self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
File ~/.local/lib/python3.10/site-packages/langchain/chains/base.py:148, in Chain.acall(self, inputs, return_only_outputs)
142 self.callback_manager.on_chain_start(
143 {"name": self.__class__.__name__},
144 inputs,
145 verbose=self.verbose,
146 )
147 try:
--> 148 outputs = await self._acall(inputs)
149 except (KeyboardInterrupt, Exception) as e:
150 if self.callback_manager.is_async:
File ~/.local/lib/python3.10/site-packages/langchain/chains/llm.py:135, in LLMChain._acall(self, inputs)
134 async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, str]:
--> 135 return (await self.aapply([inputs]))[0]
File ~/.local/lib/python3.10/site-packages/langchain/chains/llm.py:123, in LLMChain.aapply(self, input_list)
121 async def aapply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:
122 """Utilize the LLM generate method for speed gains."""
--> 123 response = await self.agenerate(input_list)
124 return self.create_outputs(response)
File ~/.local/lib/python3.10/site-packages/langchain/chains/llm.py:67, in LLMChain.agenerate(self, input_list)
65 """Generate LLM result from inputs."""
66 prompts, stop = await self.aprep_prompts(input_list)
---> 67 return await self.llm.agenerate_prompt(prompts, stop)
File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:113, in BaseLLM.agenerate_prompt(self, prompts, stop)
109 async def agenerate_prompt(
110 self, prompts: List[PromptValue], stop: Optional[List[str]] = None
111 ) -> LLMResult:
112 prompt_strings = [p.to_string() for p in prompts]
--> 113 return await self.agenerate(prompt_strings, stop=stop)
File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:229, in BaseLLM.agenerate(self, prompts, stop)
227 else:
228 self.callback_manager.on_llm_error(e, verbose=self.verbose)
--> 229 raise e
230 if self.callback_manager.is_async:
231 await self.callback_manager.on_llm_end(
232 new_results, verbose=self.verbose
233 )
File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:223, in BaseLLM.agenerate(self, prompts, stop)
217 self.callback_manager.on_llm_start(
218 {"name": self.__class__.__name__},
219 missing_prompts,
220 verbose=self.verbose,
221 )
222 try:
--> 223 new_results = await self._agenerate(missing_prompts, stop=stop)
224 except (KeyboardInterrupt, Exception) as e:
225 if self.callback_manager.is_async:
File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:334, in LLM._agenerate(self, prompts, stop)
332 generations = []
333 for prompt in prompts:
--> 334 text = await self._acall(prompt, stop=stop)
335 generations.append([Generation(text=text)])
336 return LLMResult(generations=generations)
File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:315, in LLM._acall(self, prompt, stop)
313 async def _acall(self, prompt: str, stop: Optional[List[str]] = None) -> str:
314 """Run the LLM on the given prompt and input."""
--> 315 raise NotImplementedError("Async generation not implemented for this LLM.")
NotImplementedError: Async generation not implemented for this LLM.
What I did was to:
- Get the llama 7B model here: https://github.com/cocktailpeanut/dalai
- Process it according to the instructions for the llama.cpp repository: https://github.com/ggerganov/llama.cpp
- Use the final models with paper-qa
You can try the example here which should be much faster than using paper-qa which makes many calls to the llm: https://python.langchain.com/en/latest/modules/models/text_embedding/examples/llamacpp.html
@Kohulan I'll take a look though - maybe langchain doesn't have an async fallback.
@kjelljorner Thank you I had no issues with the models but rather an async fallback. @whitead Thanks for looking into this. I will update this thread if I could come up with a solution.
Langchain has this on their roadmap, so I think we'll just leave this open and wait for upstream.
@whitead Great thanks for the update!
I am having the exact same issue right now, any update on this yet?
#92 should fix this
@whitead Great Thanks a lot! I will update you.
@whitead I am getting the following error while using LLama. I am using the same code shown in the local llm example.
ImportError: cannot import name 'AsyncCallbackManager' from 'langchain.callbacks.base'
Have any idea on this?
Hello @fbyukgo and @Kohulan, we have just released version 5, which completely outsources all LLM management to https://github.com/BerriAI/litellm.
As such, I am going to close this out. If your issue persists, please reopen a new issue using paper-qa>=5