langchainjs icon indicating copy to clipboard operation
langchainjs copied to clipboard

Is it possible to add streaming to a ConversationChain?

Open gzuuus opened this issue 1 year ago • 9 comments

I currently have the following chain and I was wondering if it was possible to add stream for the llm response? I have checked docs and tried using chain.stream but it doesn't work, however if I use chat.stream it does stream the response well. Please can someone enlighten me on how to achieve this?

const chat = new ChatOllama({
        baseUrl: "http://localhost:11434",
        model: "neural-chat",
        temperature: 0.5,
    });
const chain = new ConversationChain({
    memory: new BufferWindowMemory({ returnMessages: true, memoryKey: "history", k:3, chatHistory: new ChatMessageHistory(pastMessages) }),
    prompt: chatPrompt,
    llm: chat,
    });

gzuuus avatar Dec 18 '23 17:12 gzuuus

🤖

Hey there @gzuuus! :wave: I'm Dosu, a friendly bot here to assist you while we wait for a human maintainer to join us. I specialize in solving bugs, answering questions, and even helping you become a contributor. Just remember, I'm here to help, so feel free to ask anything about your issue. Let's get this sorted out together! :robot: :wrench:

Based on the information provided, it seems like you're trying to add streaming to a ConversationChain in the LangChainJS framework. However, the ConversationChain class might not inherently support streaming in the same way as the ChatOllama class does.

The ConversationChain class is designed to process the input from the user and generate a response from the AI by calling the call method. If you want to add streaming to this process, you would need to modify the call method to yield results instead of returning them all at once. This would involve turning the call method into an async generator function, similar to the modification suggested in the issue #3293.

Here's an example of how you could modify the call method to support streaming:

/** @ignore */
async *_call(
  values: ChainValues,
  runManager?: CallbackManagerForChainRun
): AsyncGenerator<ChainValues> {
  if (!(this.inputKey in values)) {
    throw new Error(`Question key "${this.inputKey}" not found.`);
  }
  const question: string = values[this.inputKey];
  const docs = await this.retriever.getRelevantDocuments(
    question,
    runManager?.getChild("retriever")
  );
  const inputs = { question, input_documents: docs, ...values };
  const result = await this.combineDocumentsChain.call(
    inputs,
    runManager?.getChild("combine_documents")
  );
  if (this.returnSourceDocuments) {
    yield {
      ...result,
      sourceDocuments: docs,
    };
  }
  yield result;
}

In this modification, the call method is now an async generator function, which means it can yield multiple values over time. This allows it to stream results back to the caller, rather than returning them all at once. The yield keyword is used to produce a value that can be consumed by the caller, and then the function execution is paused until the caller retrieves the next value.

Please note that this modification would require a good understanding of the LangChainJS framework and its dependencies, and it may have unintended side effects if not done correctly. Always make sure to thoroughly test any changes you make to the codebase.

If you're looking for a more straightforward way to add streaming to a ConversationChain, you might want to consider using the call method with the callbacks option, as suggested in the issue #3131. However, this would depend on whether the ConversationChain class has a similar interface to the ChatOpenAI class.

Here's a hypothetical example of how you could use the call method with the callbacks option:

import { ConversationChain } from "langchain/conversation_models";
import { HumanMessage } from "langchain/schema";

const chain = new ConversationChain({
  maxTokens: 25,
  streaming: true,
});

const response = await chain.call([new HumanMessage("Tell me a joke.")], {
  callbacks: [
    {
      handleLLMNewToken(token: string) {
        console.log({ token });
      },
    },
  ],
});

console.log(response);

This code assumes that the ConversationChain class has a similar interface to the ChatOpenAI class. If it doesn't, you would need to adjust the code accordingly. However, without more information about the ConversationChain class, I cannot provide a more accurate answer.

I hope this helps! If you have any further questions, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Dec 18 '23 17:12 dosubot[bot]

@dosu-bot Thanks for providing these details, however modifying the call method doesn't make much sense since it would modify internal library files, and this is not desirable when updating, and your second example doesn't seem right either, I have not been able to set 'streaming: true' for my 'ChatOllama' or ' ConversationChain' Currently my chain and model are as follows:

const chat = new ChatOllama({
        baseUrl: "http://localhost:11434",
        model: "neural-chat",
        temperature: 0.5,
    });

const chain = new ConversationChain({
    memory: new BufferWindowMemory({ returnMessages: true, memoryKey: "history", k:3, chatHistory: new ChatMessageHistory(pastMessages) }),
    prompt: chatPrompt,
    llm: chat,
    });

const result = await chain.call({
            input: text,
          });

As you can see there you have described all the configuration of my chain, what is the best way to add streaming to this case? At the moment I haven't found how to achieve this

gzuuus avatar Dec 19 '23 12:12 gzuuus

🤖

Based on the issues and solutions I found in the LangChainJS repository, it seems that the .stream() method is currently only for expression language sequences and not for ConversationChain or ChatOllama.

To enable streaming in a ConversationChain, you can follow the same pattern as shown in the example for the OpenAI class. You need to pass streaming: true to the constructor and provide a callback for the handleLLMNewToken event. Here's how you can do it:

import { ConversationChain } from "langchain/llms/conversationchain";

const chat = new ConversationChain({
  streaming: true,
  callbacks: [
    {
      handleLLMNewToken(token: string) {
        process.stdout.write(token);
      },
    },
  ],
});

await chat.call("Write me a song about sparkling water.");

This code will enable streaming for the ConversationChain instance and print each new token to the standard output as it is generated. If this doesn't work, it might be a bug in the LangChainJS framework or a problem with your specific setup.

For more details, you can refer to the following links:

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Dec 19 '23 13:12 dosubot[bot]

@dosu-bot Thanks for providing these details, however modifying the call method doesn't make much sense since it would modify internal library files, and this is not desirable when updating, and your second example doesn't seem right either, I have not been able to set 'streaming: true' for my 'ChatOllama' or ' ConversationChain' Currently my chain and model are as follows:

const chat = new ChatOllama({
        baseUrl: "http://localhost:11434",
        model: "neural-chat",
        temperature: 0.5,
    });

const chain = new ConversationChain({
    memory: new BufferWindowMemory({ returnMessages: true, memoryKey: "history", k:3, chatHistory: new ChatMessageHistory(pastMessages) }),
    prompt: chatPrompt,
    llm: chat,
    });

const result = await chain.call({
            input: text,
          });

As you can see there you have described all the configuration of my chain, what is the best way to add streaming to this case? At the moment I haven't found how to achieve this

You can assign streaming=True in ChatOpenAI definition in Python. I am not sure about the ChatOllama.

cfa532 avatar Feb 13 '24 10:02 cfa532

emm, you can try using LECL

const chain = RunnableSequence.from([
    {
        question: (input: { question: string}) =>
        input.question,
    },
    prompt,
    model,
//  new StringOutputParser()
]);
    
const stream = await chain
.stream({ question: "hello" }, {
    callbacks: [{
        handleLLMNewToken(token: string) {
            console.log({ token });
        }
    }]
});
for await (const chunk of stream);

yhmjSky avatar Mar 06 '24 09:03 yhmjSky

This took me a little while to piece together, but FWIW here's what I ended up with:

import { ChatOllama } from '@langchain/community/chat_models/ollama'
import { AIMessage, HumanMessage } from '@langchain/core/messages'
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts'
import { ConversationChain } from 'langchain/chains'
import { BufferMemory, ChatMessageHistory } from 'langchain/memory'

const model = new ChatOllama({
	baseUrl: 'http://localhost:11434', // Default value
	model: 'llama3:latest',
})

const prompt = ChatPromptTemplate.fromMessages([
	['system', 'You are a helpful chatbot'],
	new MessagesPlaceholder('history'),
	['human', '{input}'],
])

const memory = new BufferMemory({
	memoryKey: 'history',
	returnMessages: true,
	chatHistory: new ChatMessageHistory([]),
})

const chain = new ConversationChain({
	llm: model,
	prompt: prompt,
	memory: memory,
})

chain.stream(
	{ input: "What's the biggest moon in the solar system?" },
	{
		callbacks: [
			{
				handleLLMNewToken: (data) => {
					console.log(data)
				},
			},
		],
	}
).then(() => {
	console.log("done")
})

steveruizok avatar Apr 20 '24 16:04 steveruizok

The LECL is a great improvement from the old way. I happened to stumble upon a clean solution for streaming with LLMChain.

CHAT_LLM = ChatOpenAI(temperature=0.0, model="gpt-4",streaming=True)
chain = prompt | CHAT_LLM
async for chunk in chain.astream(query):
    print(chunk.content, end="|", flush=True)
    await websocket.send(json.dumps({"type": "stream", "data": chunk.content}))

The chunk is streamed content. Here I am using websocket to send streaming data to remote client.

cfa532 avatar Apr 21 '24 06:04 cfa532

when i create a

The LECL is a great improvement from the old way. I happened to stumble upon a clean solution for streaming with LLMChain.

CHAT_LLM = ChatOpenAI(temperature=0.0, model="gpt-4",streaming=True)
chain = prompt | CHAT_LLM
async for chunk in chain.astream(query):
    print(chunk.content, end="|", flush=True)
    await websocket.send(json.dumps({"type": "stream", "data": chunk.content}))

The chunk is streamed content. Here I am using websocket to send streaming data to remote client.

do you still have a memory in this setting ?

jeanfredd avatar Apr 26 '24 14:04 jeanfredd

@jeanfredd No. I did not use memory in this simple chain. I read it some where that in LECL memory can be added in prompt. Now I prefer to keep chat history at client side. If I need memeory, I will compse a message with old memories and new query at client side and send it to the server. This approach is straight-forward and flexible. For example I am trying to compose a memory with only the past queries from human, I think it might be good enough to remind AI the context of the chat. It will save me a lot of tokens.

when i create a

The LECL is a great improvement from the old way. I happened to stumble upon a clean solution for streaming with LLMChain.

CHAT_LLM = ChatOpenAI(temperature=0.0, model="gpt-4",streaming=True)
chain = prompt | CHAT_LLM
async for chunk in chain.astream(query):
    print(chunk.content, end="|", flush=True)
    await websocket.send(json.dumps({"type": "stream", "data": chunk.content}))

The chunk is streamed content. Here I am using websocket to send streaming data to remote client.

do you still have a memory in this setting ?

cfa532 avatar Apr 26 '24 15:04 cfa532

This took me a little while to piece together, but FWIW here's what I ended up with:

import { ChatOllama } from '@langchain/community/chat_models/ollama'
import { AIMessage, HumanMessage } from '@langchain/core/messages'
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts'
import { ConversationChain } from 'langchain/chains'
import { BufferMemory, ChatMessageHistory } from 'langchain/memory'

const model = new ChatOllama({
	baseUrl: 'http://localhost:11434', // Default value
	model: 'llama3:latest',
})

const prompt = ChatPromptTemplate.fromMessages([
	['system', 'You are a helpful chatbot'],
	new MessagesPlaceholder('history'),
	['human', '{input}'],
])

const memory = new BufferMemory({
	memoryKey: 'history',
	returnMessages: true,
	chatHistory: new ChatMessageHistory([]),
})

const chain = new ConversationChain({
	llm: model,
	prompt: prompt,
	memory: memory,
})

chain.stream(
	{ input: "What's the biggest moon in the solar system?" },
	{
		callbacks: [
			{
				handleLLMNewToken: (data) => {
					console.log(data)
				},
			},
		],
	}
).then(() => {
	console.log("done")
})

This worked for me! But do you know if this is any less performant since it looks like we're triggering the "streaming" as a callback and not directly?

TowhidKashem avatar Aug 09 '24 03:08 TowhidKashem