langchain icon indicating copy to clipboard operation
langchain copied to clipboard

LLMRouterChain uses deprecated predict_and_parse method

Open amosjyng opened this issue 1 year ago • 2 comments

System Info

langchain v0.0.216, Python 3.11.3 on WSL2

Who can help?

@hwchase17

Information

  • [X] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [ ] LLMs/Chat Models
  • [ ] Embedding Models
  • [X] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [X] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

Follow the first example at https://python.langchain.com/docs/modules/chains/foundational/router

Expected behavior

This line gets triggered:

The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.

As suggested by the error, we can make the following code changes to pass the output parser directly to LLMChain by changing this line to this:

llm_chain = LLMChain(llm=llm, prompt=prompt, output_parser=prompt.output_parser)

And calling LLMChain.__call__ instead of LLMChain.predict_and_parse by changing these lines to this:

cast(
    Dict[str, Any],
    self.llm_chain(inputs, callbacks=callbacks),
)

Unfortunately, while this avoids the warning, it creates a new error:

ValueError: Missing some output keys: {'destination', 'next_inputs'}

because LLMChain currently assumes the existence of a single self.output_key and produces this as output:

{'text': {'destination': 'physics', 'next_inputs': {'input': 'What is black body radiation?'}}}

Even modifying that function to return the keys if the parsed output is a dict triggers the same error, but for the missing key of "text" instead. predict_and_parse avoids this fate by skipping output validation entirely.

It appears changes may have to be a bit more involved here if LLMRouterChain is to keep using LLMChain.

amosjyng avatar Jun 27 '23 11:06 amosjyng

I am seeing the same issue on v0.0.221 , Python 3.10.6 Windows 10

tvmaly avatar Jul 03 '23 03:07 tvmaly

I have the same issue on v0.0.229, Python v3.10.12

AI-Chef avatar Jul 11 '23 08:07 AI-Chef

I have the same issue on v0.0.230, Python v3.10.6 Windows 11

liaokaime avatar Jul 12 '23 02:07 liaokaime

same issue here

xzhang8g avatar Jul 13 '23 20:07 xzhang8g

langchain v0.0.232

[/opt/homebrew/lib/python3.11/site-packages/langchain/chains/llm.py:275](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/langchain/chains/llm.py:275): UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.

alexminza avatar Jul 14 '23 12:07 alexminza

langchain v0.0.235, Python 3.9.17 Windows 10

bbirdxr avatar Jul 20 '23 03:07 bbirdxr

langchain v0.0.240, Python 3.10.10 macOS Ventura

ellisxu avatar Jul 28 '23 08:07 ellisxu

langchain v0.0.244, Python 3.10.11 Windows 10

mpearce-bain avatar Aug 03 '23 15:08 mpearce-bain

I am seeing the same issue on v0.0.257 Python 3.9.12 RedHat

ZHANGJUN-OK avatar Aug 15 '23 02:08 ZHANGJUN-OK

same issue v0.0.270 python 3.11.3 windows

prdy20 avatar Aug 23 '23 21:08 prdy20

Same warning on v0.0.275 python 3.11.3 WSL on Windows 11

gtmray avatar Aug 29 '23 08:08 gtmray

I also get the same error when using this option:

print(compressed_docs[0].page_content)

If I remove the page_content method, I do not get this error. Also if I use this (CharacterTextSplitter), I do not see the error

text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)

Here is a sample function where this happens:

def demo(question):
    '''
    
    Follow the steps below to fill out this function:
    '''
    # PART ONE:
    
    loader = TextLoader('/map/to/document/data.txt', encoding='utf8')
    documents = loader.load()
     
    
    # PART TWO
    # Split the document into chunks (you choose how and what size)
    # text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=1000)
    
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=100, separators=[" ", ",", "\n"])
    docs = text_splitter.split_documents(documents)
    
    # PART THREE
    # EMBED THE Documents (now in chunks) to a persisted ChromaDB
    embedding_function = OpenAIEmbeddings()
    db = Chroma.from_documents(docs, embedding_function, persist_directory='./App')
    db.persist
     

    # PART FOUR
    # Use ChatOpenAI and ContextualCompressionRetriever to return the most
    # relevant part of the documents.
    llm = ChatOpenAI(temperature=0)
    compressor = LLMChainExtractor.from_llm(llm)
    compression_retreiver = ContextualCompressionRetriever(base_compressor=compressor,
                                                          base_retriever=db.as_retriever())
    compressed_docs=compression_retreiver.get_relevant_documents(question)

     

    print(compressed_docs[0].page_content)

ja4h3ad avatar Aug 30 '23 22:08 ja4h3ad

y'all any fix to it? What are we supposed to do

mikeymice avatar Sep 05 '23 10:09 mikeymice

Got the same warning while using load_qa_chain chain_type='map_rerank',

ton77v avatar Sep 12 '23 05:09 ton77v

I posted a very similar issue #10462, using SelfQueryRetriever. I received some sort of solution from the chatbot. However, I don't have a clue how to implement it.

RoderickVM avatar Sep 14 '23 17:09 RoderickVM

I posted a very similar issue #10462, using SelfQueryRetriever. I received some sort of solution from the chatbot. However, I don't have a clue how to implement it.

I solved it by extending SelfQueryRetriever and overwriting the _get_relevant_documents method. Below is an example (In this example, I overwrite _aget_relevant_documents, because I need async features in my case. You can do the same to _get_relevant_documents. ):

class AsyncSelfQueryRetriever(SelfQueryRetriever):
    async def _aget_relevant_documents(
        self, query: str, *, run_manager: AsyncCallbackManagerForRetrieverRun
    ) -> List[Document]:
        """Asynchronously get documents relevant to a query.
        Args:
            query: String to find relevant documents for
            run_manager: The callbacks handler to use
        Returns:
            List of relevant documents
        """
        inputs = self.llm_chain.prep_inputs({"query": query})

        structured_query = cast(
            StructuredQuery,
            # Instead of calling 'self.llm_chain.predict_and_parse' here, 
            # I changed it to leveraging 'self.llm_chain.prompt.output_parser.parse' 
            # and 'self.llm_chain.apredict'
            # ↓↓↓↓↓↓↓
            self.llm_chain.prompt.output_parser.parse(
                await self.llm_chain.apredict(
                    callbacks=run_manager.get_child(), **inputs
                )
            ),
        )
        if self.verbose:
            print(structured_query)
        new_query, new_kwargs = self.structured_query_translator.visit_structured_query(
            structured_query
        )
        if structured_query.limit is not None:
            new_kwargs["k"] = structured_query.limit

        if self.use_original_query:
            new_query = query

        search_kwargs = {**self.search_kwargs, **new_kwargs}
        docs = await self.vectorstore.asearch(
            new_query, self.search_type, **search_kwargs
        )
        return docs

ellisxu avatar Sep 15 '23 09:09 ellisxu

I posted a very similar issue #10462, using SelfQueryRetriever. I received some sort of solution from the chatbot. However, I don't have a clue how to implement it.

I solved it by extending SelfQueryRetriever and overwriting the _get_relevant_documents method. Below is an example (In this example, I overwrite _aget_relevant_documents, because I need async features in my case. You can do the same to _get_relevant_documents. ):

Hi! @ellisxu Very appreciated with your sharing! Can you show the packages which are imported from? Thanks a lot!

a92340a avatar Oct 05 '23 01:10 a92340a

I posted a very similar issue #10462, using SelfQueryRetriever. I received some sort of solution from the chatbot. However, I don't have a clue how to implement it.

I solved it by extending SelfQueryRetriever and overwriting the _get_relevant_documents method. Below is an example (In this example, I overwrite _aget_relevant_documents, because I need async features in my case. You can do the same to _get_relevant_documents. ):

Hi! @ellisxu Very appreciated with your sharing! Can you show the packages which are imported from? Thanks a lot!

The package is from langchain.retrievers.self_query.base import SelfQueryRetriever and the langchain version I use is 0.0.279.

ellisxu avatar Oct 15 '23 01:10 ellisxu

I am still running into this with SelfQueryRetriever using langchain==0.0.302....is there any resolution here?

nickeleres avatar Nov 21 '23 19:11 nickeleres

I am still running into this with SelfQueryRetriever using langchain==0.0.302....is there any resolution here?

https://github.com/langchain-ai/langchain/issues/6819#issuecomment-1720942610 Try this. :)

ellisxu avatar Nov 22 '23 03:11 ellisxu

This is unfortunate as this part of Lang Chain is used in the DeepLearningAI course

tvmaly avatar Feb 29 '24 01:02 tvmaly