langchainjs icon indicating copy to clipboard operation
langchainjs copied to clipboard

Distinguish between tool call generations and final output streams with Anthropic chat models

Open jacoblee93 opened this issue 1 year ago • 7 comments

Currently, there's no good guidance on distinguishing between an Anthropic model's generated tool call vs. a plain text output.

This makes e.g. streaming back final agent responses difficult/hacky.

See: https://github.com/langchain-ai/langgraphjs/issues/310

jacoblee93 avatar Aug 12 '24 14:08 jacoblee93

Hi @jacoblee93, Is there any progress on this bug?

I spent some time checking the code and noticed that the behavior of the OpenAI(@langchain/openai), Anthropic(@langchain/anthropic), and some other integration packages seems to differ. Could you help clarify what the standard behavior should be?

  1. In the OpenAI package, when handleLLMNewToken is called, a chunk is passed, but in the Anthropic and other packages, a chunk is not passed. (If we can obtain the chunk, it would be possible to differentiate between regular text and tool calls) https://github.com/langchain-ai/langchainjs/blob/42631e4e4d83f0f5c2f9fed9893e9fd081eaca30/libs/langchain-openai/src/chat_models.ts#L1169-L1177 https://github.com/langchain-ai/langchainjs/blob/42631e4e4d83f0f5c2f9fed9893e9fd081eaca30/libs/langchain-anthropic/src/chat_models.ts#L821

  2. In the OpenAI package, when the model streams out the tool_call parameter, the token received by handleLLMNewToken is an empty string, whereas in the Anthropic package, handleLLMNewToken does receive the tool_call parameter token.

  3. When receiving the tool_call parameter, should the on_chat_model_stream event be triggered?

renxinyan avatar Aug 18 '24 07:08 renxinyan

Any update @jacoblee93 ? The @renxinyan suggestion would solve the problem when using BaseCallbackHandler

luizzappa avatar Aug 31 '24 21:08 luizzappa

Workaround with a Monkey Patch on _streamResponseChunks

// 3rd's
import { ChatAnthropicMessages } from '@langchain/anthropic';
import { BaseMessage } from '@langchain/core/messages';
import { ChatGenerationChunk } from '@langchain/core/outputs';
import { CallbackManagerForLLMRun } from '@langchain/core/callbacks/manager';

const originalStreamResponseChunks = ChatAnthropicMessages.prototype._streamResponseChunks;

ChatAnthropicMessages.prototype._streamResponseChunks = async function* (
  messages: BaseMessage[],
  options: ChatAnthropicMessages["ParsedCallOptions"],
  runManager?: CallbackManagerForLLMRun
): AsyncGenerator<ChatGenerationChunk> {
  const generator = originalStreamResponseChunks.call(this, messages, options, runManager);

  for await (const chunk of generator) {
    await runManager?.handleLLMNewToken(
      chunk.token,
      undefined, 
      undefined, 
      undefined, 
      undefined, 
      { chunk } 
    );
    yield chunk;
  }
};

To implement it you can use something like this as the CallbackHandler:

// 3rd's
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";
import { NewTokenIndices } from "@langchain/core/callbacks/base";
import { HandleLLMNewTokenCallbackFields } from "@langchain/core/callbacks/base";

export class StreamCallbackHandler extends BaseCallbackHandler {

  name = "stream_callback_handler"

  constructor() {
    super();
  }

  public async handleLLMNewToken?(
    token: string,
    idx: NewTokenIndices,
    runId: string,
    parentRunId?: string,
    tags?: string[],
    fields?: HandleLLMNewTokenCallbackFields
    ) {
    const isOriginalImplementation = !fields,
      // @ts-ignore
      isToolCall = fields && fields.chunk?.message?.tool_call_chunks?.length > 0;
    if (isOriginalImplementation || isToolCall) return;

    return fields.chunk.text;
  }
}

And then:

const stream = await chain.stream(
  { input: "Hi! How are you?" },
  { callbacks: [new StreamCallbackHandler()] }
);

luizzappa avatar Aug 31 '24 22:08 luizzappa

Hi, @jacoblee93. I'm Dosu, and I'm helping the LangChain JS team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • The issue involves unclear guidance on distinguishing tool call generations from plain text outputs in Anthropic chat models.
  • This complicates streaming final agent responses, with noted differences between OpenAI and Anthropic packages.
  • @renxinyan highlighted differences in chunk handling and the triggering of on_chat_model_stream events.
  • @luizzappa shared a workaround using a Monkey Patch on _streamResponseChunks to differentiate between regular text and tool calls.

Next Steps:

  • Please confirm if this issue is still relevant with the latest version of the LangChain JS repository by commenting here.
  • If there is no further activity, this issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Nov 30 '24 16:11 dosubot[bot]

Leaving open

jacoblee93 avatar Nov 30 '24 16:11 jacoblee93

Hi, @jacoblee93. I'm Dosu, and I'm helping the LangChain JS team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • The issue highlights the lack of guidance on differentiating tool call generations from plain text outputs in Anthropic chat models.
  • You noted complications in streaming final agent responses due to this lack of clarity.
  • @renxinyan pointed out behavioral differences between OpenAI and Anthropic packages, especially in chunk handling and on_chat_model_stream events.
  • @luizzappa suggested a workaround using a Monkey Patch on _streamResponseChunks.
  • The issue remains unresolved, and you previously chose to keep it open for further discussion.

Next Steps:

  • Please confirm if this issue is still relevant with the latest version of the LangChain JS repository. If it is, feel free to comment to keep the discussion open.
  • If there is no further activity, the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Mar 01 '25 16:03 dosubot[bot]

still a issue

luizzappa avatar Mar 07 '25 14:03 luizzappa