langchain community: introduce mlflow output parsers

In MLflow 2.13.0, we will introduce RAG model signatures defined as dataclasses to support extensibility.

This PR introduces matching output parsers for those model signatures, such that the users don't have to reason about how to transform their chains into MLflow RAG-compatible formats.

return (
    {
        "question": itemgetter("messages") | RunnableLambda(extract_question),
        "chat_history": itemgetter("messages") | RunnableLambda(extract_history),
    }
    | RunnablePassthrough()
    | {
        "relevant_docs": generate_query_to_retrieve_context_prompt
        | chat_model
        | StrOutputParser()
        | retriever,
        "chat_history": itemgetter("chat_history"),
        "question": itemgetter("question"),
    }
    | {
        "context": itemgetter("relevant_docs") | RunnableLambda(format_context),
        "chat_history": itemgetter("chat_history"),
        "question": itemgetter("question"),
    }
    | question_with_history_and_context_prompt
    | ChatDatabricks(
        endpoint=rag_config.get("llm_model"),
        **rag_config.get("llm_parameters"),
    )
    | ChatCompletionsOutputParser()
)

May 15 '24 18:05 prithvikannan

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		May 31, 2024 7:25pm

May 15 '24 18:05 vercel[bot]

@efriis @baskaryan could one of you take a look at this PR please? Thank you!

May 21 '24 19:05 BenWilson2

Howdy! Could you add some documentation of how this might be used?

Because these are so lightweight, I actually think these will introduce more maintenance cost than ease of use. Instead, it might be better to just add some documentation where this is intended to be used and recommend using a Runnable Lambda like:
from mlflow.models.rag_signatures import StringResponse

chain = llm | lambda x: StringResponse(text=x)

Hey @efriis, I tried out the approach you described, but I think its a little harder because you need to now extract data from the AIMessage yourself into the StringResponse object. As I understand it, using the BaseTransformOutputParser makes that part easy.

I wrote a chain like this:

      ...
      | chat_model
      | RunnableLambda(lambda x: StringResponse(content=x))
    )

but that will create an object like this:

ChatCompletionResponse(choices=[ChainCompletionChoice(index=0, message=Message(role='user', content=AIMessage(content='\nDatabricks provides several tools for exploratory data analysis (EDA), including:\n\n1. Databricks notebooks: These are collaborative, interactive coding environments where you can perform data manipulation, analysis, and visualization using Python, SQL, Scala, and R.\n\n2. Data visualization tools: Databricks notebooks support various data visualization libraries like matplotlib, seaborn, and Plotly, enabling you to create charts, plots, and graphs to better understand your data.\n\n3. Delta Live Tables: A no-code/low-code data pipeline creation tool that simplifies ETL and data transformation processes, ensuring data quality and reliability.\n\n4. Structured Streaming: A powerful tool for building streaming, incremental, and real-time workloads, allowing you to analyze and process data as it arrives.\n\n5. Summary statistics generation: Databr', response_metadata={'prompt_tokens': 1572, 'completion_tokens': 200, 'total_tokens': 1772}, id='run-61dbb62c-51cb-4ffe-a02e-426a6dd1dbe9-0')), finish_reason='stop')])

May 24 '24 23:05 prithvikannan

Great catch for chat models! Thoughts on:

from mlflow.models.rag_signatures import StringResponse

chain = llm | lambda x: StringResponse(text=x.content)

for chat models (and the previous one for llms)

May 25 '24 00:05 efriis

Great catch for chat models! Thoughts on:
from mlflow.models.rag_signatures import StringResponse

chain = llm | lambda x: StringResponse(text=x.content)
for chat models (and the previous one for llms)

@efriis , i think the RunnableLambda approach does not work for streaming. See this screenshot -- when i call list(full_chain.stream(..)) I get just one object with the whole response.

if i use the ChatCompletionsOutputParser from my PR, then i am able to get each chunk when i call list(full_chain.stream(..))

May 31 '24 19:05 prithvikannan

Responding in slack

Jun 03 '24 18:06 efriis

discussed in slack - recommendation is to use

from mlflow.models.rag_signatures import StringResponse

chain = llm | lambda x: StringResponse(text=x.content)

Jun 04 '24 18:06 efriis

Oh and if you want to use streaming, you can do

from langchain_core.runnables import chain
from mlflow.models.rag_signatures import StringResponse

@chain
def string_response_output_parser(input):
    yield StringResponse(text=x.content

my_chain = llm | string_response_output_parser

Jun 04 '24 18:06 efriis

langchain langchain copied to clipboard

community: introduce mlflow output parsers

langchain
langchain copied to clipboard