promptfoo Support tool use assertions for Claude

Hey, I'd like to work on support for tool use in Claude, which was just launched https://docs.anthropic.com/claude/docs/tool-use

Unfortunately, Anthropic didn't use the OpenAI pattern for supporting this and changed a few things.

Tool definition

here's the tool definition in claude's API.

[
  {
    "name": "get_stock_price",
    "description": "Get the current stock price for a given ticker symbol.",
    "input_schema": {
      "type": "object",
      "properties": {
        "ticker": {
          "type": "string",
          "description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
        }
      },
      "required": ["ticker"]
    }
  }
]

Basically all that's changed here is that there's no base type of function and input_schema is changed from parameters. I think we can one to one map between anthropic and openai tool definitions. So I'm thinking we can support both the openai and anthropic tool definitions in anthropic.

Forcing tool use

Anthropic doesn't support forcing tool use unless you specifically reference the tool in a user message. Kind of a strange choice but I don't think there's anything to worry about here.

Tool responses

So, putting in the tool responses is a bit annoyingly different from the behavior of OpenAI.

They have tool_use responses as part of the content block that's sent back from Anthropic.

{
  "id": "msg_01Aq9w938a90dw8q",
  "model": "claude-3-opus-20240229",
  "stop_reason": "tool_use",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "<thinking>I need to use the get_weather, and the user wants SF, which is likely San Francisco, CA.</thinking>"
    },
    {
      "type": "tool_use",
      "id": "toolu_01A09q90qw90lq917835lq9",
      "name": "get_weather",
      "input": {"location": "San Francisco, CA", "unit": "celsius"}
    }
  ]
}

Then, you have to send these as part of a content block of a user message.

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": "65 degrees"
    }
  ]
}

is_error

you can also support giving an explicit flag that the tool had an error.

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": "ConnectionError: the weather service API is not available (HTTP 500)",
      "is_error": true
    }
  ]
}

There's no way to map this from openai right now so I think I'll just allow it as part of an anthropic response.

I'm going to give how to approach this from a high-level some more thought over the next day or two. I'm thinking directionally promptfoo should try to support the OpenAI prompt format as the lingua franca, and then maybe we can support anthropic specific prompts too?

Apr 06 '24 01:04 CamdenClark

Really appreciate the rundown of Claude tools, which I haven't had the chance to look into yet, and the thought you've put into next steps.

I'm thinking directionally promptfoo should try to support the OpenAI prompt format as the lingua franca, and then maybe we can support anthropic specific prompts too?

I agree with this approach, and in some cases we're starting to move in that direction for regular prompting too. It's still good to give the user fine-grained control over provider-specific prompts if they desire.

Apr 06 '24 12:04 typpo

Really appreciate the rundown of Claude tools, which I haven't had the chance to look into yet, and the thought you've put into next steps.

I'm thinking directionally promptfoo should try to support the OpenAI prompt format as the lingua franca, and then maybe we can support anthropic specific prompts too?

I agree with this approach, and in some cases we're starting to move in that direction for regular prompting too. It's still good to give the user fine-grained control over provider-specific prompts if they desire.

Awesome, will get to work on it.

Apr 07 '24 02:04 CamdenClark

Hey @CamdenClark, thanks again for raising this! We actually added support for tool use in Claude a while ago but forgot to update this issue - apologies for that! You can find an example of how to use it here: Tool Use Example.

I’m going to close the issue now since it's been resolved, but if you run into any problems or have further suggestions, feel free to reopen or create a new issue. Thanks again for your contribution and patience!

Oct 01 '24 06:10 mldangelo