langchainjs Streaming causes LLM to always start answers with a rephrased version of the question

Describe the issue

When I enable streaming on the OpenAI model, it causes all answers to begin with a rephrased version of the question.

Example:

Question from user: "How far is the sun?" Answer streamed from handleLLMNewToken: "What is the distance from Earth to the sun? I don't know."

It's worth noting that the answer streamed token-by-token by handleLLMNewToken is different than the response return by await chain.call. The latter returns { text: " I don't know." }, which is the desired behavior. The problem is that this value can't be streamed. As far as I know the streaming needs to happen from inside handleLLMNewToken like this:

const sendData = (data: string) => {
    res.write(`data: ${data}\n\n`);
  };

const model = new OpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY,
    streaming: true,
    callbackManager: CallbackManager.fromHandlers({
      async handleLLMNewToken(token: string) {
        console.log('handleLLMNewToken', token); 
        sendData(JSON.stringify({ data: token })); // stream each token
      },
      async handleLLMStart(llm: any, prompts: string[]) {
        console.log('handleLLMStart');
      },
    }),
  });

Environment

"langchain": "^0.0.51", "next": "13.3.0",

Apr 12 '23 00:04 jacob-ruiz

I am getting this as well for the second and third etc. questions that I ask. Was trying to follow the lex-gpt repo as a guide for edge streaming.

This seems to be due to using ConversationalRetrievalQAChain and chat history. handleLLMNewToken and handleLLMEnd are called twice, breaking streaming. Things seem to be working fine when I use RetrievalQAChain.

Apr 17 '23 09:04 trevorpfiz

Related #842

Apr 17 '23 10:04 trevorpfiz

happens here too.

Apr 19 '23 19:04 rafalzawadzki

Having this issue as well after updating to 0.0.61. Looks like handleLLMNewToken when passed to the OpenAI LLM callbacks is returning the tokens from the "standalone question" , not the final result.

Apr 22 '23 00:04 danielmahon

Getting the same issue

Apr 23 '23 04:04 sam4mation

Related #603

Apr 23 '23 06:04 suyingtao

I assume you're using the conversational QA chain, from the docs

It first combines the chat history and the question into a standalone question, then looks up relevant documents from the retriever, and then passes those documents and the question to a question answering chain to return a response.

So i guess it's prompting the LLM to get a summary question that's then fed (with docs) to the LLM again?

May 04 '23 10:05 jca41

Work around i've been using

class EventsHandler extends BaseCallbackHandler {
	private writer: WritableStreamDefaultWriter<string>;
	private stream: WritableStream<string>;
	private question: string;

	private LLMCount = 0;

	constructor({ stream, question }: { stream: WritableStream<string>; question: string }) {
		super();

		this.writer = stream.getWriter();
		this.stream = stream;
		this.question = question;
	}

	isContextQuestion() {
		return this.LLMCount === 0 && chatHistory.size;
	}

	async handleLLMNewToken(token: string) {
		if (!this.isContextQuestion()) {
			await this.writer.ready;
			return this.writer.write(token);
		}
	}
	async handleLLMEnd(output: ChainValues) {
		const result = output?.generations?.[0]?.[0]?.text;
		this.LLMCount = this.LLMCount + 1;
	}

	async handleLLMError(err: unknown) {
		this.writer.releaseLock();
		await this.stream.abort((err as Error).message);
	}
}

May 04 '23 14:05 jca41

Work around i've been using

class EventsHandler extends BaseCallbackHandler {
	private writer: WritableStreamDefaultWriter<string>;
	private stream: WritableStream<string>;
	private question: string;

	private LLMCount = 0;

	constructor({ stream, question }: { stream: WritableStream<string>; question: string }) {
		super();

		this.writer = stream.getWriter();
		this.stream = stream;
		this.question = question;
	}

	isContextQuestion() {
		return this.LLMCount === 0 && chatHistory.size;
	}

	async handleLLMNewToken(token: string) {
		if (!this.isContextQuestion()) {
			await this.writer.ready;
			return this.writer.write(token);
		}
	}
	async handleLLMEnd(output: ChainValues) {
		const result = output?.generations?.[0]?.[0]?.text;
		this.LLMCount = this.LLMCount + 1;
	}

	async handleLLMError(err: unknown) {
		this.writer.releaseLock();
		await this.stream.abort((err as Error).message);
	}
}

Can you elaborate how to implement this workaround? I have not been successful yet

May 09 '23 09:05 phoenixz0024

We're looking into the best way of solving this, should have more info here soon

May 09 '23 09:05 nfcampos

The immediate fix for this issue is to do the following

// construct your chain as before
const chain = ConversationalRetrievalQAChain.fromLLM(new ChatOpenAI({streaming: true, ...}), ...)
// after creating the chain override the LLM in the inner `questionGeneratorChain`
chain.questionGeneratorChain.llm = new ChatOpenAI()

// use the chain

We're working on a better solution

May 09 '23 14:05 nfcampos

^ To elaborate on the above explanation, the key is to ensure that the llm behind the questionGeneratorChain has streaming equal to false and that the LLM you pass into fromLLM has a handler that implements handleLLMNewToken (passed in as part of callbacks)

May 09 '23 19:05 agola11

@nfcampos I'm assuming this workaround wont work in the context of an agent executor (without reimplementing a lot)? As it seems to use the same LLM for each tool call? I was able to put together a hacky filter to get the final output tokens from the callback for anyone interested:

import { BaseCallbackHandler } from 'langchain/callbacks';
import { EventController } from './event_controller';

export class CallbackHandler extends BaseCallbackHandler {
  name = 'CallbackHandler';
  stream = new EventController<string>();

  private _buffer = '';
  private _isInsideActionInput = false;
  private _hasFoundFinalAnswer = false;
  private readonly _finalAnswerString = '"action": "Final Answer"';
  private readonly _outputString = '"action_input": "';

  handleLLMEnd() {
    if (this._hasFoundFinalAnswer) {
      this.stream.end();
    }
  }

  handleAgentEnd() {
    this.stream.end();
  }

  handleLLMNewToken(token: string) {
    this._buffer += token;

    if (this._isInsideActionInput) {
      this._streamBuffer();
    } else if (!this._hasFoundFinalAnswer) {
      this._findFinalAnswer();
    } else {
      this._findOutput();
    }
  }

  private _findOutput() {
    const startIndex = this._buffer.indexOf(this._outputString);
    if (startIndex !== -1) {
      this._isInsideActionInput = true;
      this._buffer = this._buffer.slice(startIndex + this._outputString.length);
    } else {
      this._buffer = this._buffer.slice(-this._outputString.length + 1);
    }
  }

  private _findFinalAnswer() {
    const isFinalIndex = this._buffer.indexOf(this._finalAnswerString);
    if (isFinalIndex !== -1) {
      this._hasFoundFinalAnswer = true;
      this._buffer = '';
    } else {
      this._buffer = this._buffer.slice(-this._finalAnswerString.length + 1);
      return;
    }
  }

  private _streamBuffer() {
    const endIndex = this._buffer.search(/(?<!\\)"/);
    if (endIndex !== -1) {
      const value = this._buffer.slice(0, endIndex);
      this.stream.publish(value);
      this._isInsideActionInput = false;
      this._buffer = this._buffer.slice(endIndex + 1);
    } else {
      this.stream.publish(this._buffer);
      this._buffer = '';
    }
  }
}

May 09 '23 22:05 danielmahon

Seeing the same issue, applied the workaround and seems to work. But since it is a non-streaming openai client, it triggers the error Refused to set unsafe header "User-Agent" for me (I'm working on an Electron app that calls OpenAI directly). The streaming configuration does not trigger this error.

May 25 '23 03:05 logancyang

In the chain's handlers, I differentiate the source of tokens based on the chain's name to determine which tokens should be sent to the client. Below is an example:

const handlers = BaseCallbackHandler.fromMethods({
    handleChainStart(chain) {
        if (chain.name === 'stuff_documents_chain') {
            isStuffDocumentsChain = true
        }
    },
    handleLLMNewToken(token) {
        if (isStuffDocumentsChain) {
            subscriber.next(token)
        }
    },
    handleChainEnd() {
        isStuffDocumentsChain = false
    },
})

Here is the complete code

May 28 '23 14:05 sivanzheng

Any update on this one? It's still giving me rephrased questions.

Jul 11 '23 00:07 logancyang

If you're looking to stream the final result using the ConversationRetrievalQAChain() along with Pinecone, I've set up a deployed example that demonstrates this functionality. You can check it out here: Example.

GitHub Code : Code.

Furthermore, if you encounter any difficulties while working with the ConversationalRetrievalQAChain(), Please provide the relevant repository or complete context that you're working with. This will help quickly identify and address any bugs or issues you might be facing, ensuring a more efficient debugging process.

Aug 20 '23 11:08 iPanchalShubham

Hi, @jacob-ruiz! I'm Dosu, and I'm helping the langchainjs team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is about enabling streaming on the OpenAI model, which causes all answers to start with a rephrased version of the question. Users have reported this issue and provided workarounds, such as overriding the LLM in the inner questionGeneratorChain. The maintainers have acknowledged the issue and provided a temporary fix by overriding the LLM. They are also actively working on a better solution.

Now, I'd like to ask if this issue is still relevant to the latest version of the langchainjs repository. If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to langchainjs! Let me know if there's anything else I can assist you with.

Nov 19 '23 16:11 dosubot[bot]

langchainjs langchainjs copied to clipboard

Streaming causes LLM to always start answers with a rephrased version of the question

Describe the issue

Example:

Environment

langchainjs
langchainjs copied to clipboard