Distinguish between tool call generations and final output streams with Anthropic chat models
Currently, there's no good guidance on distinguishing between an Anthropic model's generated tool call vs. a plain text output.
This makes e.g. streaming back final agent responses difficult/hacky.
See: https://github.com/langchain-ai/langgraphjs/issues/310
Hi @jacoblee93, Is there any progress on this bug?
I spent some time checking the code and noticed that the behavior of the OpenAI(@langchain/openai), Anthropic(@langchain/anthropic), and some other integration packages seems to differ. Could you help clarify what the standard behavior should be?
-
In the OpenAI package, when
handleLLMNewTokenis called, a chunk is passed, but in the Anthropic and other packages, a chunk is not passed. (If we can obtain the chunk, it would be possible to differentiate between regular text and tool calls) https://github.com/langchain-ai/langchainjs/blob/42631e4e4d83f0f5c2f9fed9893e9fd081eaca30/libs/langchain-openai/src/chat_models.ts#L1169-L1177 https://github.com/langchain-ai/langchainjs/blob/42631e4e4d83f0f5c2f9fed9893e9fd081eaca30/libs/langchain-anthropic/src/chat_models.ts#L821 -
In the OpenAI package, when the model streams out the tool_call parameter, the token received by
handleLLMNewTokenis an empty string, whereas in the Anthropic package,handleLLMNewTokendoes receive the tool_call parameter token. -
When receiving the tool_call parameter, should the
on_chat_model_stream eventbe triggered?
Any update @jacoblee93 ? The @renxinyan suggestion would solve the problem when using BaseCallbackHandler
Workaround with a Monkey Patch on _streamResponseChunks
// 3rd's
import { ChatAnthropicMessages } from '@langchain/anthropic';
import { BaseMessage } from '@langchain/core/messages';
import { ChatGenerationChunk } from '@langchain/core/outputs';
import { CallbackManagerForLLMRun } from '@langchain/core/callbacks/manager';
const originalStreamResponseChunks = ChatAnthropicMessages.prototype._streamResponseChunks;
ChatAnthropicMessages.prototype._streamResponseChunks = async function* (
messages: BaseMessage[],
options: ChatAnthropicMessages["ParsedCallOptions"],
runManager?: CallbackManagerForLLMRun
): AsyncGenerator<ChatGenerationChunk> {
const generator = originalStreamResponseChunks.call(this, messages, options, runManager);
for await (const chunk of generator) {
await runManager?.handleLLMNewToken(
chunk.token,
undefined,
undefined,
undefined,
undefined,
{ chunk }
);
yield chunk;
}
};
To implement it you can use something like this as the CallbackHandler:
// 3rd's
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";
import { NewTokenIndices } from "@langchain/core/callbacks/base";
import { HandleLLMNewTokenCallbackFields } from "@langchain/core/callbacks/base";
export class StreamCallbackHandler extends BaseCallbackHandler {
name = "stream_callback_handler"
constructor() {
super();
}
public async handleLLMNewToken?(
token: string,
idx: NewTokenIndices,
runId: string,
parentRunId?: string,
tags?: string[],
fields?: HandleLLMNewTokenCallbackFields
) {
const isOriginalImplementation = !fields,
// @ts-ignore
isToolCall = fields && fields.chunk?.message?.tool_call_chunks?.length > 0;
if (isOriginalImplementation || isToolCall) return;
return fields.chunk.text;
}
}
And then:
const stream = await chain.stream(
{ input: "Hi! How are you?" },
{ callbacks: [new StreamCallbackHandler()] }
);
Hi, @jacoblee93. I'm Dosu, and I'm helping the LangChain JS team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- The issue involves unclear guidance on distinguishing tool call generations from plain text outputs in Anthropic chat models.
- This complicates streaming final agent responses, with noted differences between OpenAI and Anthropic packages.
- @renxinyan highlighted differences in chunk handling and the triggering of
on_chat_model_streamevents. - @luizzappa shared a workaround using a Monkey Patch on
_streamResponseChunksto differentiate between regular text and tool calls.
Next Steps:
- Please confirm if this issue is still relevant with the latest version of the LangChain JS repository by commenting here.
- If there is no further activity, this issue will be automatically closed in 7 days.
Thank you for your understanding and contribution!
Leaving open
Hi, @jacoblee93. I'm Dosu, and I'm helping the LangChain JS team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- The issue highlights the lack of guidance on differentiating tool call generations from plain text outputs in Anthropic chat models.
- You noted complications in streaming final agent responses due to this lack of clarity.
- @renxinyan pointed out behavioral differences between OpenAI and Anthropic packages, especially in chunk handling and
on_chat_model_streamevents. - @luizzappa suggested a workaround using a Monkey Patch on
_streamResponseChunks. - The issue remains unresolved, and you previously chose to keep it open for further discussion.
Next Steps:
- Please confirm if this issue is still relevant with the latest version of the LangChain JS repository. If it is, feel free to comment to keep the discussion open.
- If there is no further activity, the issue will be automatically closed in 7 days.
Thank you for your understanding and contribution!
still a issue