haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

fix(Bedrock): allow tools kwargs for AWS Bedrock Claude model

Open lambda-science opened this issue 1 year ago • 12 comments

Related Issues

  • fixes https://github.com/deepset-ai/haystack-core-integrations/issues/975

Proposed Changes:

Allow the tools kwargs for AWS Bedrock Anthropic models because it is listed as valid here: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html

How did you test it?

I did not, as it's a tiny fix and I didn't really have time to setup the whole testing process.

Notes for the reviewer

Checklist

lambda-science avatar Aug 13 '24 11:08 lambda-science

LGTM but you have a look as well @davidsbatista the context is in this Discord thread

vblagoje avatar Aug 13 '24 13:08 vblagoje

@lambda-science would you please add a unit test, not as much to add some deep rigours unit test case here, but to have that test fail if tools was removed by whatever reason 🙏

vblagoje avatar Aug 13 '24 14:08 vblagoje

@lambda-science would you please add a unit test, not as much to add some deep rigours unit test case here, but to have that test fail if tools was removed by whatever reason 🙏

I will look into it tomorrow :)

lambda-science avatar Aug 13 '24 14:08 lambda-science

@lambda-science looks good to me, please as @vblagoje mentioned just add a unit test

davidsbatista avatar Aug 14 '24 09:08 davidsbatista

Hello,

I worked on adding a test for tool use using Bedrock. And multiple problem appeared (so it was a good idea). The current Bedrock implementation was processing the replies too strictly and removing answers of type tool_use. So I had to modify the adapter to add tool_use replies type to the ChatMessages by dumping the tool_use as JSON content.

The issue is that the test is flacky, either it pass or it crash with:

    @component.output_types(replies=List[ChatMessage])
    def run(
        self,
        messages: List[ChatMessage],
        streaming_callback: Optional[Callable[[StreamingChunk], None]] = None,
        generation_kwargs: Optional[Dict[str, Any]] = None,
    ):
        """
        Generates a list of `ChatMessage` response to the given messages using the Amazon Bedrock LLM.

        :param messages: The messages to generate a response to.
        :param streaming_callback:
            A callback function that is called when a new token is received from the stream.
        :param generation_kwargs: Additional generation keyword arguments passed to the model.
        :returns: A dictionary with the following keys:
            - `replies`: The generated List of `ChatMessage` objects.
        """
        generation_kwargs = generation_kwargs or {}
        generation_kwargs = generation_kwargs.copy()

        streaming_callback = streaming_callback or self.streaming_callback
        generation_kwargs["stream"] = streaming_callback is not None

        # check if the prompt is a list of ChatMessage objects
        if not (
            isinstance(messages, list)
            and len(messages) > 0
            and all(isinstance(message, ChatMessage) for message in messages)
        ):
            msg = f"The model {self.model} requires a list of ChatMessage objects as a prompt."
            raise ValueError(msg)

        body = self.model_adapter.prepare_body(
            messages=messages, **{"stop_words": self.stop_words, **generation_kwargs}
        )
        try:
            if streaming_callback:
                response = self.client.invoke_model_with_response_stream(
                    body=json.dumps(body), modelId=self.model, accept="application/json", contentType="application/json"
                )
                response_stream = response["body"]
                replies = self.model_adapter.get_stream_responses(
                    stream=response_stream, streaming_callback=streaming_callback
                )
            else:
                response = self.client.invoke_model(
                    body=json.dumps(body), modelId=self.model, accept="application/json", contentType="application/json"
                )
                response_body = json.loads(response.get("body").read().decode("utf-8"))
                replies = self.model_adapter.get_responses(response_body=response_body)
        except ClientError as exception:
            msg = f"Could not inference Amazon Bedrock model {self.model} due: {exception}"
            raise AmazonBedrockInferenceError(msg) from exception

        # rename the meta key to be inline with OpenAI meta output keys
        for response in replies:
            if response.meta is not None and "usage" in response.meta:
>               response.meta["usage"]["prompt_tokens"] = response.meta["usage"].pop("input_tokens")
E               KeyError: 'input_tokens'

src\haystack_integrations\components\generators\amazon_bedrock\chat\chat_generator.py:209: KeyError

FAILED tests/test_chat_generator.py::TestAnthropicClaudeAdapter::test_tools_use[anthropic.claude-3-haiku-20240307-v1:0] - KeyError: 'input_tokens'

The reason is that this test produce two types of answers:

  1. Either the model answer only ONE reply that is the tool_use type like {'type': 'tool_use', 'id': 'toolu_bdrk_01NcAB4KKhQzdV6eX1ji1Hpk', 'name': 'top_song', 'input': {'sign': 'WZPZ'}} => TEST IS PASSING
  2. Or the model produce two replies:
  • One text type answer such as {'type': 'text', 'text': 'Here is the most popular song played on radio station WZPZ:'}
  • One tool_use type answer as before: {'type': 'tool_use', 'id': 'toolu_bdrk_01NcAB4KKhQzdV6eX1ji1Hpk', 'name': 'top_song', 'input': {'sign': 'WZPZ'}} But in this case only one instance of meta data is present (prompt/completion/input/ouput tokens). I'm not entirely sure how to handle this. Your input would be valuable. @vblagoje @davidsbatista

Also I'm not sure what is the standard for tool_use output (in ChatMessage), I simply dumped the whole reply as json, maybe it's not the right way to do, but test is passing (C/C from Anthropic implementation -- non-bedrock).

Also this is 100% with the InvokeModel way of calling tools, the new Converse API from Bedrock use another tool spec input (model-provider agnostic). It has its avantages and drawbacks lets says... for now let's focus on making it work with the current InvokeModel implementation)

lambda-science avatar Aug 14 '24 09:08 lambda-science

Maybe we need a way to force the model to discard the first text answer if there is a tool_use answer right after to only ouput the tool_use reply ? Because having two possible type of answer is a mess to handle from a developper point of view.

Let's say you want your model to do function calling, you should check then if the replies you get are text or tool_use to extract the right JSON, you can't simply take the first answer and call it a day (like the new tool test does, taken directly from Anthropic integration)

lambda-science avatar Aug 14 '24 09:08 lambda-science

@lambda-science please add it as a new test function

davidsbatista avatar Aug 14 '24 10:08 davidsbatista

@lambda-science please add it as a new test function

Hello, I'm not sure what you meant by this, as I added the test as a new function ?

On another note I think I solved the flacky test. What I did is that when the stop_reason is a tool_use we only return the content that has the tool_use type (effectively discarding the text type content). Else we return chat message as usual.

The flacky test was because when normalizing meta keys, we mutate a dictionnary that was not a deep copy (meta). So if we have a two replies, on the second repliy the dictionary keys are already modified ! My modification solve it in a way that, if it's a tool_use, there will be only one reply.

All tests are passing now

lambda-science avatar Aug 14 '24 12:08 lambda-science

Hello, I'm not sure what you meant by this, as I added the test as a new function ?

sorry, I oversee that you did it

davidsbatista avatar Aug 14 '24 14:08 davidsbatista

Ah, as possible further improvement for this PR we could/should also add the tool_choice option as well (with test, in the scope of function calling PR improvement). I will work on this in the futur day https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html

lambda-science avatar Aug 14 '24 14:08 lambda-science

Yes @lambda-science add it, your intuition is the best guideline here. I'll take a look tomorrow 🙏

vblagoje avatar Aug 14 '24 14:08 vblagoje

Yes @lambda-science add it, your intuition is the best guideline here. I'll take a look tomorrow 🙏

Everything should be good now, ready for review :) params tools and tool_choice added with an implmentation test (you should run it in the CI/CD, I guess it doesn't by default)

lambda-science avatar Aug 14 '24 16:08 lambda-science

I would be happy to get pinged when release in pypi✌️

lambda-science avatar Aug 21 '24 11:08 lambda-science

@lambda-science ping 🙂

julian-risch avatar Aug 21 '24 12:08 julian-risch