langchainjs
langchainjs copied to clipboard
Is it possible to add streaming to a ConversationChain?
I currently have the following chain and I was wondering if it was possible to add stream for the llm response? I have checked docs and tried using chain.stream but it doesn't work, however if I use chat.stream it does stream the response well. Please can someone enlighten me on how to achieve this?
const chat = new ChatOllama({
baseUrl: "http://localhost:11434",
model: "neural-chat",
temperature: 0.5,
});
const chain = new ConversationChain({
memory: new BufferWindowMemory({ returnMessages: true, memoryKey: "history", k:3, chatHistory: new ChatMessageHistory(pastMessages) }),
prompt: chatPrompt,
llm: chat,
});
🤖
Hey there @gzuuus! :wave: I'm Dosu, a friendly bot here to assist you while we wait for a human maintainer to join us. I specialize in solving bugs, answering questions, and even helping you become a contributor. Just remember, I'm here to help, so feel free to ask anything about your issue. Let's get this sorted out together! :robot: :wrench:
Based on the information provided, it seems like you're trying to add streaming to a ConversationChain
in the LangChainJS framework. However, the ConversationChain
class might not inherently support streaming in the same way as the ChatOllama
class does.
The ConversationChain
class is designed to process the input from the user and generate a response from the AI by calling the call
method. If you want to add streaming to this process, you would need to modify the call
method to yield results instead of returning them all at once. This would involve turning the call
method into an async generator function, similar to the modification suggested in the issue #3293.
Here's an example of how you could modify the call
method to support streaming:
/** @ignore */
async *_call(
values: ChainValues,
runManager?: CallbackManagerForChainRun
): AsyncGenerator<ChainValues> {
if (!(this.inputKey in values)) {
throw new Error(`Question key "${this.inputKey}" not found.`);
}
const question: string = values[this.inputKey];
const docs = await this.retriever.getRelevantDocuments(
question,
runManager?.getChild("retriever")
);
const inputs = { question, input_documents: docs, ...values };
const result = await this.combineDocumentsChain.call(
inputs,
runManager?.getChild("combine_documents")
);
if (this.returnSourceDocuments) {
yield {
...result,
sourceDocuments: docs,
};
}
yield result;
}
In this modification, the call
method is now an async generator function, which means it can yield multiple values over time. This allows it to stream results back to the caller, rather than returning them all at once. The yield
keyword is used to produce a value that can be consumed by the caller, and then the function execution is paused until the caller retrieves the next value.
Please note that this modification would require a good understanding of the LangChainJS framework and its dependencies, and it may have unintended side effects if not done correctly. Always make sure to thoroughly test any changes you make to the codebase.
If you're looking for a more straightforward way to add streaming to a ConversationChain
, you might want to consider using the call
method with the callbacks
option, as suggested in the issue #3131. However, this would depend on whether the ConversationChain
class has a similar interface to the ChatOpenAI
class.
Here's a hypothetical example of how you could use the call
method with the callbacks
option:
import { ConversationChain } from "langchain/conversation_models";
import { HumanMessage } from "langchain/schema";
const chain = new ConversationChain({
maxTokens: 25,
streaming: true,
});
const response = await chain.call([new HumanMessage("Tell me a joke.")], {
callbacks: [
{
handleLLMNewToken(token: string) {
console.log({ token });
},
},
],
});
console.log(response);
This code assumes that the ConversationChain
class has a similar interface to the ChatOpenAI
class. If it doesn't, you would need to adjust the code accordingly. However, without more information about the ConversationChain
class, I cannot provide a more accurate answer.
I hope this helps! If you have any further questions, feel free to ask.
Best, Dosu
Sources
- RetrievalQAChain & VectorDBQAChain doesn't supports streaming replies
- Streaming mechanism for ChatBedrock Class still not work.
- ChatBedrock streaming
- examples/src/models/chat/chat_streaming.ts
- langchain/src/chains/conversation.ts
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot Thanks for providing these details, however modifying the call method doesn't make much sense since it would modify internal library files, and this is not desirable when updating, and your second example doesn't seem right either, I have not been able to set 'streaming: true' for my 'ChatOllama' or ' ConversationChain' Currently my chain and model are as follows:
const chat = new ChatOllama({
baseUrl: "http://localhost:11434",
model: "neural-chat",
temperature: 0.5,
});
const chain = new ConversationChain({
memory: new BufferWindowMemory({ returnMessages: true, memoryKey: "history", k:3, chatHistory: new ChatMessageHistory(pastMessages) }),
prompt: chatPrompt,
llm: chat,
});
const result = await chain.call({
input: text,
});
As you can see there you have described all the configuration of my chain, what is the best way to add streaming to this case? At the moment I haven't found how to achieve this
🤖
Based on the issues and solutions I found in the LangChainJS repository, it seems that the .stream()
method is currently only for expression language sequences and not for ConversationChain
or ChatOllama
.
To enable streaming in a ConversationChain
, you can follow the same pattern as shown in the example for the OpenAI
class. You need to pass streaming: true
to the constructor and provide a callback for the handleLLMNewToken
event. Here's how you can do it:
import { ConversationChain } from "langchain/llms/conversationchain";
const chat = new ConversationChain({
streaming: true,
callbacks: [
{
handleLLMNewToken(token: string) {
process.stdout.write(token);
},
},
],
});
await chat.call("Write me a song about sparkling water.");
This code will enable streaming for the ConversationChain
instance and print each new token to the standard output as it is generated. If this doesn't work, it might be a bug in the LangChainJS framework or a problem with your specific setup.
For more details, you can refer to the following links:
- Usage in streaming mode
- Expression language
- Streaming mechanism for ChatBedrock Class still not work
- ChatBedrock streaming
- Streaming not working when using Ollama + RetrievalQAChain
I hope this helps! If you have any other questions, feel free to ask.
Sources
- ChatBedrock streaming
-
Add Streaming mechanism for
ChatBedrock
Class - Streaming not working when using Ollama + RetrievalQAChain
- examples/src/models/llm/llm_streaming_stdout.ts
- libs/langchain-mistralai/src/chat_models.ts
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot Thanks for providing these details, however modifying the call method doesn't make much sense since it would modify internal library files, and this is not desirable when updating, and your second example doesn't seem right either, I have not been able to set 'streaming: true' for my 'ChatOllama' or ' ConversationChain' Currently my chain and model are as follows:
const chat = new ChatOllama({ baseUrl: "http://localhost:11434", model: "neural-chat", temperature: 0.5, }); const chain = new ConversationChain({ memory: new BufferWindowMemory({ returnMessages: true, memoryKey: "history", k:3, chatHistory: new ChatMessageHistory(pastMessages) }), prompt: chatPrompt, llm: chat, }); const result = await chain.call({ input: text, });
As you can see there you have described all the configuration of my chain, what is the best way to add streaming to this case? At the moment I haven't found how to achieve this
You can assign streaming=True in ChatOpenAI definition in Python. I am not sure about the ChatOllama.
emm, you can try using LECL
const chain = RunnableSequence.from([
{
question: (input: { question: string}) =>
input.question,
},
prompt,
model,
// new StringOutputParser()
]);
const stream = await chain
.stream({ question: "hello" }, {
callbacks: [{
handleLLMNewToken(token: string) {
console.log({ token });
}
}]
});
for await (const chunk of stream);
This took me a little while to piece together, but FWIW here's what I ended up with:
import { ChatOllama } from '@langchain/community/chat_models/ollama'
import { AIMessage, HumanMessage } from '@langchain/core/messages'
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts'
import { ConversationChain } from 'langchain/chains'
import { BufferMemory, ChatMessageHistory } from 'langchain/memory'
const model = new ChatOllama({
baseUrl: 'http://localhost:11434', // Default value
model: 'llama3:latest',
})
const prompt = ChatPromptTemplate.fromMessages([
['system', 'You are a helpful chatbot'],
new MessagesPlaceholder('history'),
['human', '{input}'],
])
const memory = new BufferMemory({
memoryKey: 'history',
returnMessages: true,
chatHistory: new ChatMessageHistory([]),
})
const chain = new ConversationChain({
llm: model,
prompt: prompt,
memory: memory,
})
chain.stream(
{ input: "What's the biggest moon in the solar system?" },
{
callbacks: [
{
handleLLMNewToken: (data) => {
console.log(data)
},
},
],
}
).then(() => {
console.log("done")
})
The LECL is a great improvement from the old way. I happened to stumble upon a clean solution for streaming with LLMChain.
CHAT_LLM = ChatOpenAI(temperature=0.0, model="gpt-4",streaming=True)
chain = prompt | CHAT_LLM
async for chunk in chain.astream(query):
print(chunk.content, end="|", flush=True)
await websocket.send(json.dumps({"type": "stream", "data": chunk.content}))
The chunk is streamed content. Here I am using websocket to send streaming data to remote client.
when i create a
The LECL is a great improvement from the old way. I happened to stumble upon a clean solution for streaming with LLMChain.
CHAT_LLM = ChatOpenAI(temperature=0.0, model="gpt-4",streaming=True) chain = prompt | CHAT_LLM async for chunk in chain.astream(query): print(chunk.content, end="|", flush=True) await websocket.send(json.dumps({"type": "stream", "data": chunk.content}))
The chunk is streamed content. Here I am using websocket to send streaming data to remote client.
do you still have a memory in this setting ?
@jeanfredd No. I did not use memory in this simple chain. I read it some where that in LECL memory can be added in prompt. Now I prefer to keep chat history at client side. If I need memeory, I will compse a message with old memories and new query at client side and send it to the server. This approach is straight-forward and flexible. For example I am trying to compose a memory with only the past queries from human, I think it might be good enough to remind AI the context of the chat. It will save me a lot of tokens.
when i create a
The LECL is a great improvement from the old way. I happened to stumble upon a clean solution for streaming with LLMChain.
CHAT_LLM = ChatOpenAI(temperature=0.0, model="gpt-4",streaming=True) chain = prompt | CHAT_LLM async for chunk in chain.astream(query): print(chunk.content, end="|", flush=True) await websocket.send(json.dumps({"type": "stream", "data": chunk.content}))
The chunk is streamed content. Here I am using websocket to send streaming data to remote client.
do you still have a memory in this setting ?
This took me a little while to piece together, but FWIW here's what I ended up with:
import { ChatOllama } from '@langchain/community/chat_models/ollama' import { AIMessage, HumanMessage } from '@langchain/core/messages' import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts' import { ConversationChain } from 'langchain/chains' import { BufferMemory, ChatMessageHistory } from 'langchain/memory' const model = new ChatOllama({ baseUrl: 'http://localhost:11434', // Default value model: 'llama3:latest', }) const prompt = ChatPromptTemplate.fromMessages([ ['system', 'You are a helpful chatbot'], new MessagesPlaceholder('history'), ['human', '{input}'], ]) const memory = new BufferMemory({ memoryKey: 'history', returnMessages: true, chatHistory: new ChatMessageHistory([]), }) const chain = new ConversationChain({ llm: model, prompt: prompt, memory: memory, }) chain.stream( { input: "What's the biggest moon in the solar system?" }, { callbacks: [ { handleLLMNewToken: (data) => { console.log(data) }, }, ], } ).then(() => { console.log("done") })
This worked for me! But do you know if this is any less performant since it looks like we're triggering the "streaming" as a callback and not directly?