langchain
langchain copied to clipboard
community: introduce mlflow output parsers
In MLflow 2.13.0, we will introduce RAG model signatures defined as dataclasses to support extensibility.
This PR introduces matching output parsers for those model signatures, such that the users don't have to reason about how to transform their chains into MLflow RAG-compatible formats.
return (
{
"question": itemgetter("messages") | RunnableLambda(extract_question),
"chat_history": itemgetter("messages") | RunnableLambda(extract_history),
}
| RunnablePassthrough()
| {
"relevant_docs": generate_query_to_retrieve_context_prompt
| chat_model
| StrOutputParser()
| retriever,
"chat_history": itemgetter("chat_history"),
"question": itemgetter("question"),
}
| {
"context": itemgetter("relevant_docs") | RunnableLambda(format_context),
"chat_history": itemgetter("chat_history"),
"question": itemgetter("question"),
}
| question_with_history_and_context_prompt
| ChatDatabricks(
endpoint=rag_config.get("llm_model"),
**rag_config.get("llm_parameters"),
)
| ChatCompletionsOutputParser()
)
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
langchain | ⬜️ Ignored (Inspect) | Visit Preview | May 31, 2024 7:25pm |
@efriis @baskaryan could one of you take a look at this PR please? Thank you!
Howdy! Could you add some documentation of how this might be used?
Because these are so lightweight, I actually think these will introduce more maintenance cost than ease of use. Instead, it might be better to just add some documentation where this is intended to be used and recommend using a Runnable Lambda like:
from mlflow.models.rag_signatures import StringResponse chain = llm | lambda x: StringResponse(text=x)
Hey @efriis, I tried out the approach you described, but I think its a little harder because you need to now extract data from the AIMessage
yourself into the StringResponse
object. As I understand it, using the BaseTransformOutputParser
makes that part easy.
I wrote a chain like this:
...
| chat_model
| RunnableLambda(lambda x: StringResponse(content=x))
)
but that will create an object like this:
ChatCompletionResponse(choices=[ChainCompletionChoice(index=0, message=Message(role='user', content=AIMessage(content='\nDatabricks provides several tools for exploratory data analysis (EDA), including:\n\n1. Databricks notebooks: These are collaborative, interactive coding environments where you can perform data manipulation, analysis, and visualization using Python, SQL, Scala, and R.\n\n2. Data visualization tools: Databricks notebooks support various data visualization libraries like matplotlib, seaborn, and Plotly, enabling you to create charts, plots, and graphs to better understand your data.\n\n3. Delta Live Tables: A no-code/low-code data pipeline creation tool that simplifies ETL and data transformation processes, ensuring data quality and reliability.\n\n4. Structured Streaming: A powerful tool for building streaming, incremental, and real-time workloads, allowing you to analyze and process data as it arrives.\n\n5. Summary statistics generation: Databr', response_metadata={'prompt_tokens': 1572, 'completion_tokens': 200, 'total_tokens': 1772}, id='run-61dbb62c-51cb-4ffe-a02e-426a6dd1dbe9-0')), finish_reason='stop')])
Great catch for chat models! Thoughts on:
from mlflow.models.rag_signatures import StringResponse
chain = llm | lambda x: StringResponse(text=x.content)
for chat models (and the previous one for llms)
Great catch for chat models! Thoughts on:
from mlflow.models.rag_signatures import StringResponse chain = llm | lambda x: StringResponse(text=x.content)
for chat models (and the previous one for llms)
@efriis , i think the RunnableLambda
approach does not work for streaming. See this screenshot -- when i call list(full_chain.stream(..))
I get just one object with the whole response.
if i use the ChatCompletionsOutputParser
from my PR, then i am able to get each chunk when i call list(full_chain.stream(..))
Responding in slack
discussed in slack - recommendation is to use
from mlflow.models.rag_signatures import StringResponse
chain = llm | lambda x: StringResponse(text=x.content)
Oh and if you want to use streaming, you can do
from langchain_core.runnables import chain
from mlflow.models.rag_signatures import StringResponse
@chain
def string_response_output_parser(input):
yield StringResponse(text=x.content
my_chain = llm | string_response_output_parser