paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

Docs() generates exception due to langchain AIMessage() not supplying __len__() when I specify a langchain llm

Open maspotts opened this issue 1 year ago • 1 comments

I've just noticed that if I specify a langchain llm, eg:

    return Docs(llm = 'langchain', client = ChatOpenAI(), index_path = index_path)

and then try to index a document I get this exception:

Traceback (most recent call last): File "/Users/mike/src/chatbot/./chatbot-multi-wrap", line 11842, in bot.build(corpus) File "/Users/mike/src/chatbot/./chatbot-multi-wrap", line 5033, in build index.add(path) File "/Users/mike/.pyenv/versions/3.10.7/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paperqa/docs.py", line 321, in add return loop.run_until_complete( File "/Users/mike/.pyenv/versions/3.10.7/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete return future.result() File "/Users/mike/.pyenv/versions/3.10.7/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paperqa/docs.py", line 356, in aadd chain_result = await cite_chain({"text": texts[0].text}, None) File "/Users/mike/.pyenv/versions/3.10.7/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paperqa/llms.py", line 299, in execute result.completion_count = self.count_tokens(output) File "/Users/mike/.pyenv/versions/3.10.7/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paperqa/llms.py", line 175, in count_tokens return len(text) // 4 # gross approximation TypeError: object of type 'AIMessage' has no len()

which seems to be due to langchain using its own AIMessage class to encode messages, and not supplying a len() method. Is this my mistake? Or is it something that needs patching? Thanks!

Update: I'm using paper-qa 4.4.0 with langchain 0.1.13

maspotts avatar Mar 24 '24 17:03 maspotts

I guess the root issue is that LLMModel.count_tokens() expects text to be a string, whereas it is (now, lately?) a langchain_core.messages.AIMessage, which contains a string (or union of dicts of strings). So the best approach might be to rewrite LLMModel to expect a string OR AIMessage and process it accordingly? I tried monkey-patching LLMModel to try this out but I got sucked into a maze of imports and methods that receive AIMessage instances (another is in Docs.aadd() where len(citation) similarly throws an exception).

maspotts avatar Mar 24 '24 19:03 maspotts

Hello @maspotts, we have just released version 5, which completely removes LangChain from our stack and centers on https://github.com/BerriAI/litellm. So feel free to use LangChain as needed.

If your issue persists, please reopen a new issue using paper-qa>=5

jamesbraza avatar Sep 11 '24 17:09 jamesbraza