langchainjs Distinguish between tool call generations and final output streams with Anthropic chat models

Currently, there's no good guidance on distinguishing between an Anthropic model's generated tool call vs. a plain text output.

This makes e.g. streaming back final agent responses difficult/hacky.

See: https://github.com/langchain-ai/langgraphjs/issues/310

Aug 12 '24 14:08 jacoblee93

Hi @jacoblee93, Is there any progress on this bug?

I spent some time checking the code and noticed that the behavior of the OpenAI(@langchain/openai), Anthropic(@langchain/anthropic), and some other integration packages seems to differ. Could you help clarify what the standard behavior should be?

In the OpenAI package, when handleLLMNewToken is called, a chunk is passed, but in the Anthropic and other packages, a chunk is not passed. (If we can obtain the chunk, it would be possible to differentiate between regular text and tool calls) https://github.com/langchain-ai/langchainjs/blob/42631e4e4d83f0f5c2f9fed9893e9fd081eaca30/libs/langchain-openai/src/chat_models.ts#L1169-L1177 https://github.com/langchain-ai/langchainjs/blob/42631e4e4d83f0f5c2f9fed9893e9fd081eaca30/libs/langchain-anthropic/src/chat_models.ts#L821
In the OpenAI package, when the model streams out the tool_call parameter, the token received by handleLLMNewToken is an empty string, whereas in the Anthropic package, handleLLMNewToken does receive the tool_call parameter token.
When receiving the tool_call parameter, should the on_chat_model_stream event be triggered?

Aug 18 '24 07:08 renxinyan

Any update @jacoblee93 ? The @renxinyan suggestion would solve the problem when using BaseCallbackHandler

Aug 31 '24 21:08 luizzappa

Workaround with a Monkey Patch on _streamResponseChunks

// 3rd's
import { ChatAnthropicMessages } from '@langchain/anthropic';
import { BaseMessage } from '@langchain/core/messages';
import { ChatGenerationChunk } from '@langchain/core/outputs';
import { CallbackManagerForLLMRun } from '@langchain/core/callbacks/manager';

const originalStreamResponseChunks = ChatAnthropicMessages.prototype._streamResponseChunks;

ChatAnthropicMessages.prototype._streamResponseChunks = async function* (
  messages: BaseMessage[],
  options: ChatAnthropicMessages["ParsedCallOptions"],
  runManager?: CallbackManagerForLLMRun
): AsyncGenerator<ChatGenerationChunk> {
  const generator = originalStreamResponseChunks.call(this, messages, options, runManager);

  for await (const chunk of generator) {
    await runManager?.handleLLMNewToken(
      chunk.token,
      undefined, 
      undefined, 
      undefined, 
      undefined, 
      { chunk } 
    );
    yield chunk;
  }
};

To implement it you can use something like this as the CallbackHandler:

// 3rd's
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";
import { NewTokenIndices } from "@langchain/core/callbacks/base";
import { HandleLLMNewTokenCallbackFields } from "@langchain/core/callbacks/base";

export class StreamCallbackHandler extends BaseCallbackHandler {

  name = "stream_callback_handler"

  constructor() {
    super();
  }

  public async handleLLMNewToken?(
    token: string,
    idx: NewTokenIndices,
    runId: string,
    parentRunId?: string,
    tags?: string[],
    fields?: HandleLLMNewTokenCallbackFields
    ) {
    const isOriginalImplementation = !fields,
      // @ts-ignore
      isToolCall = fields && fields.chunk?.message?.tool_call_chunks?.length > 0;
    if (isOriginalImplementation || isToolCall) return;

    return fields.chunk.text;
  }
}

And then:

const stream = await chain.stream(
  { input: "Hi! How are you?" },
  { callbacks: [new StreamCallbackHandler()] }
);

Aug 31 '24 22:08 luizzappa

Hi, @jacoblee93. I'm Dosu, and I'm helping the LangChain JS team manage their backlog. I'm marking this issue as stale.

Issue Summary:

The issue involves unclear guidance on distinguishing tool call generations from plain text outputs in Anthropic chat models.
This complicates streaming final agent responses, with noted differences between OpenAI and Anthropic packages.
@renxinyan highlighted differences in chunk handling and the triggering of on_chat_model_stream events.
@luizzappa shared a workaround using a Monkey Patch on _streamResponseChunks to differentiate between regular text and tool calls.

Next Steps:

Please confirm if this issue is still relevant with the latest version of the LangChain JS repository by commenting here.
If there is no further activity, this issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

Nov 30 '24 16:11 dosubot[bot]

Leaving open

Nov 30 '24 16:11 jacoblee93

Hi, @jacoblee93. I'm Dosu, and I'm helping the LangChain JS team manage their backlog. I'm marking this issue as stale.

Issue Summary:

The issue highlights the lack of guidance on differentiating tool call generations from plain text outputs in Anthropic chat models.
You noted complications in streaming final agent responses due to this lack of clarity.
@renxinyan pointed out behavioral differences between OpenAI and Anthropic packages, especially in chunk handling and on_chat_model_stream events.
@luizzappa suggested a workaround using a Monkey Patch on _streamResponseChunks.
The issue remains unresolved, and you previously chose to keep it open for further discussion.

Next Steps:

Please confirm if this issue is still relevant with the latest version of the LangChain JS repository. If it is, feel free to comment to keep the discussion open.
If there is no further activity, the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

Mar 01 '25 16:03 dosubot[bot]

still a issue

Mar 07 '25 14:03 luizzappa