paper-qa Using a local model has issues

Hi @whitead ,

I used the following code block and used a quantized LLM from here: https://huggingface.co/Pi3141

The model I used: https://huggingface.co/Pi3141/gpt4-x-alpaca-native-13B-ggml

import paperscraper
from paperqa import Docs
from langchain.embeddings import LlamaCppEmbeddings
from langchain.llms import LlamaCpp

llm = LlamaCpp(model_path="ggml-model-q4_1.bin")
embeddings = LlamaCppEmbeddings(model_path="ggml-model-q4_1.bin")
docs = Docs(llm=llm, embeddings=embeddings)

keyword_search = 'bispecific antibody manufacture'
papers = paperscraper.search_papers(keyword_search, limit=2)
for path,data in papers.items():
    try:
        docs.add(path,chunk_chars=500)
    except ValueError as e:
        print('Could not read', path, e)

answer = docs.query("What manufacturing challenges are unique to bispecific antibodies?")
print(answer)

This did run for nearly 1.5 hours and crashed.

Error log:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Input In [4], in <cell line: 12>()
     10         print('Could not read', path, e)
     11 print("Im here")
---> 12 answer = docs.query("What manufacturing challenges are unique to bispecific antibodies?")
     13 print(answer)
     14 end_time = time.time()

File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:374, in Docs.query(self, query, k, max_sources, length_prompt, marginal_relevance, answer, key_filter)
    372     loop = asyncio.new_event_loop()
    373     asyncio.set_event_loop(loop)
--> 374 return loop.run_until_complete(
    375     self.aquery(
    376         query,
    377         k=k,
    378         max_sources=max_sources,
    379         length_prompt=length_prompt,
    380         marginal_relevance=marginal_relevance,
    381         answer=answer,
    382         key_filter=key_filter,
    383     )
    384 )

File ~/.local/lib/python3.10/site-packages/nest_asyncio.py:89, in _patch_loop.<locals>.run_until_complete(self, future)
     86 if not f.done():
     87     raise RuntimeError(
     88         'Event loop stopped before Future completed.')
---> 89 return f.result()

File /usr/lib/python3.10/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception.with_traceback(self._exception_tb)
    202 return self._result

File /usr/lib/python3.10/asyncio/tasks.py:234, in Task.__step(***failed resolving arguments***)
    232         result = coro.send(None)
    233     else:
--> 234         result = coro.throw(exc)
    235 except StopIteration as exc:
    236     if self._must_cancel:
    237         # Task is cancelled right before coro stops.

File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:406, in Docs.aquery(self, query, k, max_sources, length_prompt, marginal_relevance, answer, key_filter)
    404         answer.tokens += cb.total_tokens
    405         answer.cost += cb.total_cost
--> 406     answer = await self.aget_evidence(
    407         answer,
    408         k=k,
    409         max_sources=max_sources,
    410         marginal_relevance=marginal_relevance,
    411         key_filter=keys if key_filter else None,
    412     )
    413 context_str, contexts = answer.context, answer.contexts
    414 bib = dict()

File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:311, in Docs.aget_evidence(self, answer, k, max_sources, marginal_relevance, key_filter)
    308     return None
    310 with get_openai_callback() as cb:
--> 311     contexts = await asyncio.gather(*[process(doc) for doc in docs])
    312 answer.tokens += cb.total_tokens
    313 answer.cost += cb.total_cost

File /usr/lib/python3.10/asyncio/tasks.py:304, in Task.__wakeup(self, future)
    302 def __wakeup(self, future):
    303     try:
--> 304         future.result()
    305     except BaseException as exc:
    306         # This may also be a cancellation.
    307         self.__step(exc)

File /usr/lib/python3.10/asyncio/tasks.py:232, in Task.__step(***failed resolving arguments***)
    228 try:
    229     if exc is None:
    230         # We use the `send` method directly, because coroutines
    231         # don't have `__iter__` and `__next__` methods.
--> 232         result = coro.send(None)
    233     else:
    234         result = coro.throw(exc)

File ~/.local/lib/python3.10/site-packages/paperqa/docs.py:299, in Docs.aget_evidence.<locals>.process(doc)
    294 if doc.metadata["key"] in [c.key for c in answer.contexts]:
    295     return None
    296 c = Context(
    297     key=doc.metadata["key"],
    298     citation=doc.metadata["citation"],
--> 299     context=await self.summary_chain.arun(
    300         question=answer.question,
    301         context_str=doc.page_content,
    302         citation=doc.metadata["citation"],
    303     ),
    304     text=doc.page_content,
    305 )
    306 if "Not applicable" not in c.context:
    307     return c

File ~/.local/lib/python3.10/site-packages/langchain/chains/base.py:237, in Chain.arun(self, *args, **kwargs)
    234     return (await self.acall(args[0]))[self.output_keys[0]]
    236 if kwargs and not args:
--> 237     return (await self.acall(kwargs))[self.output_keys[0]]
    239 raise ValueError(
    240     f"`run` supported with either positional arguments or keyword arguments"
    241     f" but not both. Got args: {args} and kwargs: {kwargs}."
    242 )

File ~/.local/lib/python3.10/site-packages/langchain/chains/base.py:154, in Chain.acall(self, inputs, return_only_outputs)
    152     else:
    153         self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 154     raise e
    155 if self.callback_manager.is_async:
    156     await self.callback_manager.on_chain_end(outputs, verbose=self.verbose)

File ~/.local/lib/python3.10/site-packages/langchain/chains/base.py:148, in Chain.acall(self, inputs, return_only_outputs)
    142     self.callback_manager.on_chain_start(
    143         {"name": self.__class__.__name__},
    144         inputs,
    145         verbose=self.verbose,
    146     )
    147 try:
--> 148     outputs = await self._acall(inputs)
    149 except (KeyboardInterrupt, Exception) as e:
    150     if self.callback_manager.is_async:

File ~/.local/lib/python3.10/site-packages/langchain/chains/llm.py:135, in LLMChain._acall(self, inputs)
    134 async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, str]:
--> 135     return (await self.aapply([inputs]))[0]

File ~/.local/lib/python3.10/site-packages/langchain/chains/llm.py:123, in LLMChain.aapply(self, input_list)
    121 async def aapply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:
    122     """Utilize the LLM generate method for speed gains."""
--> 123     response = await self.agenerate(input_list)
    124     return self.create_outputs(response)

File ~/.local/lib/python3.10/site-packages/langchain/chains/llm.py:67, in LLMChain.agenerate(self, input_list)
     65 """Generate LLM result from inputs."""
     66 prompts, stop = await self.aprep_prompts(input_list)
---> 67 return await self.llm.agenerate_prompt(prompts, stop)

File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:113, in BaseLLM.agenerate_prompt(self, prompts, stop)
    109 async def agenerate_prompt(
    110     self, prompts: List[PromptValue], stop: Optional[List[str]] = None
    111 ) -> LLMResult:
    112     prompt_strings = [p.to_string() for p in prompts]
--> 113     return await self.agenerate(prompt_strings, stop=stop)

File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:229, in BaseLLM.agenerate(self, prompts, stop)
    227     else:
    228         self.callback_manager.on_llm_error(e, verbose=self.verbose)
--> 229     raise e
    230 if self.callback_manager.is_async:
    231     await self.callback_manager.on_llm_end(
    232         new_results, verbose=self.verbose
    233     )

File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:223, in BaseLLM.agenerate(self, prompts, stop)
    217     self.callback_manager.on_llm_start(
    218         {"name": self.__class__.__name__},
    219         missing_prompts,
    220         verbose=self.verbose,
    221     )
    222 try:
--> 223     new_results = await self._agenerate(missing_prompts, stop=stop)
    224 except (KeyboardInterrupt, Exception) as e:
    225     if self.callback_manager.is_async:

File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:334, in LLM._agenerate(self, prompts, stop)
    332 generations = []
    333 for prompt in prompts:
--> 334     text = await self._acall(prompt, stop=stop)
    335     generations.append([Generation(text=text)])
    336 return LLMResult(generations=generations)

File ~/.local/lib/python3.10/site-packages/langchain/llms/base.py:315, in LLM._acall(self, prompt, stop)
    313 async def _acall(self, prompt: str, stop: Optional[List[str]] = None) -> str:
    314     """Run the LLM on the given prompt and input."""
--> 315     raise NotImplementedError("Async generation not implemented for this LLM.")

NotImplementedError: Async generation not implemented for this LLM.

Apr 14 '23 08:04 Kohulan

What I did was to:

Get the llama 7B model here: https://github.com/cocktailpeanut/dalai
Process it according to the instructions for the llama.cpp repository: https://github.com/ggerganov/llama.cpp
Use the final models with paper-qa

You can try the example here which should be much faster than using paper-qa which makes many calls to the llm: https://python.langchain.com/en/latest/modules/models/text_embedding/examples/llamacpp.html

Apr 14 '23 09:04 kjelljorner

@Kohulan I'll take a look though - maybe langchain doesn't have an async fallback.

Apr 15 '23 00:04 whitead

@kjelljorner Thank you I had no issues with the models but rather an async fallback. @whitead Thanks for looking into this. I will update this thread if I could come up with a solution.

Apr 16 '23 15:04 Kohulan

Langchain has this on their roadmap, so I think we'll just leave this open and wait for upstream.

Apr 21 '23 20:04 whitead

@whitead Great thanks for the update!

Apr 21 '23 23:04 Kohulan

I am having the exact same issue right now, any update on this yet?

Apr 28 '23 17:04 fbyukgo

#92 should fix this

May 03 '23 06:05 whitead

@whitead Great Thanks a lot! I will update you.

May 03 '23 06:05 Kohulan

@whitead I am getting the following error while using LLama. I am using the same code shown in the local llm example.

ImportError: cannot import name 'AsyncCallbackManager' from 'langchain.callbacks.base'

Have any idea on this?

May 03 '23 07:05 fbyukgo

Hello @fbyukgo and @Kohulan, we have just released version 5, which completely outsources all LLM management to https://github.com/BerriAI/litellm.

As such, I am going to close this out. If your issue persists, please reopen a new issue using paper-qa>=5

Sep 11 '24 18:09 jamesbraza

paper-qa paper-qa copied to clipboard

Using a local model has issues

paper-qa
paper-qa copied to clipboard