haystack-core-integrations
haystack-core-integrations copied to clipboard
fix(Bedrock): allow tools kwargs for AWS Bedrock Claude model
Related Issues
- fixes https://github.com/deepset-ai/haystack-core-integrations/issues/975
Proposed Changes:
Allow the tools kwargs for AWS Bedrock Anthropic models because it is listed as valid here: https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html
How did you test it?
I did not, as it's a tiny fix and I didn't really have time to setup the whole testing process.
Notes for the reviewer
Checklist
- I have read the contributors guidelines and the code of conduct
- I have updated the related issue with new insights and changes
- I added unit tests and updated the docstrings
- I've used one of the conventional commit types for my PR title:
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.
LGTM but you have a look as well @davidsbatista the context is in this Discord thread
@lambda-science would you please add a unit test, not as much to add some deep rigours unit test case here, but to have that test fail if tools was removed by whatever reason 🙏
@lambda-science would you please add a unit test, not as much to add some deep rigours unit test case here, but to have that test fail if tools was removed by whatever reason 🙏
I will look into it tomorrow :)
@lambda-science looks good to me, please as @vblagoje mentioned just add a unit test
Hello,
I worked on adding a test for tool use using Bedrock. And multiple problem appeared (so it was a good idea).
The current Bedrock implementation was processing the replies too strictly and removing answers of type tool_use. So I had to modify the adapter to add tool_use replies type to the ChatMessages by dumping the tool_use as JSON content.
The issue is that the test is flacky, either it pass or it crash with:
@component.output_types(replies=List[ChatMessage])
def run(
self,
messages: List[ChatMessage],
streaming_callback: Optional[Callable[[StreamingChunk], None]] = None,
generation_kwargs: Optional[Dict[str, Any]] = None,
):
"""
Generates a list of `ChatMessage` response to the given messages using the Amazon Bedrock LLM.
:param messages: The messages to generate a response to.
:param streaming_callback:
A callback function that is called when a new token is received from the stream.
:param generation_kwargs: Additional generation keyword arguments passed to the model.
:returns: A dictionary with the following keys:
- `replies`: The generated List of `ChatMessage` objects.
"""
generation_kwargs = generation_kwargs or {}
generation_kwargs = generation_kwargs.copy()
streaming_callback = streaming_callback or self.streaming_callback
generation_kwargs["stream"] = streaming_callback is not None
# check if the prompt is a list of ChatMessage objects
if not (
isinstance(messages, list)
and len(messages) > 0
and all(isinstance(message, ChatMessage) for message in messages)
):
msg = f"The model {self.model} requires a list of ChatMessage objects as a prompt."
raise ValueError(msg)
body = self.model_adapter.prepare_body(
messages=messages, **{"stop_words": self.stop_words, **generation_kwargs}
)
try:
if streaming_callback:
response = self.client.invoke_model_with_response_stream(
body=json.dumps(body), modelId=self.model, accept="application/json", contentType="application/json"
)
response_stream = response["body"]
replies = self.model_adapter.get_stream_responses(
stream=response_stream, streaming_callback=streaming_callback
)
else:
response = self.client.invoke_model(
body=json.dumps(body), modelId=self.model, accept="application/json", contentType="application/json"
)
response_body = json.loads(response.get("body").read().decode("utf-8"))
replies = self.model_adapter.get_responses(response_body=response_body)
except ClientError as exception:
msg = f"Could not inference Amazon Bedrock model {self.model} due: {exception}"
raise AmazonBedrockInferenceError(msg) from exception
# rename the meta key to be inline with OpenAI meta output keys
for response in replies:
if response.meta is not None and "usage" in response.meta:
> response.meta["usage"]["prompt_tokens"] = response.meta["usage"].pop("input_tokens")
E KeyError: 'input_tokens'
src\haystack_integrations\components\generators\amazon_bedrock\chat\chat_generator.py:209: KeyError
FAILED tests/test_chat_generator.py::TestAnthropicClaudeAdapter::test_tools_use[anthropic.claude-3-haiku-20240307-v1:0] - KeyError: 'input_tokens'
The reason is that this test produce two types of answers:
- Either the model answer only ONE reply that is the tool_use type like
{'type': 'tool_use', 'id': 'toolu_bdrk_01NcAB4KKhQzdV6eX1ji1Hpk', 'name': 'top_song', 'input': {'sign': 'WZPZ'}}=> TEST IS PASSING - Or the model produce two replies:
- One
texttype answer such as{'type': 'text', 'text': 'Here is the most popular song played on radio station WZPZ:'} - One
tool_usetype answer as before:{'type': 'tool_use', 'id': 'toolu_bdrk_01NcAB4KKhQzdV6eX1ji1Hpk', 'name': 'top_song', 'input': {'sign': 'WZPZ'}}But in this case only one instance of meta data is present (prompt/completion/input/ouput tokens). I'm not entirely sure how to handle this. Your input would be valuable. @vblagoje @davidsbatista
Also I'm not sure what is the standard for tool_use output (in ChatMessage), I simply dumped the whole reply as json, maybe it's not the right way to do, but test is passing (C/C from Anthropic implementation -- non-bedrock).
Also this is 100% with the InvokeModel way of calling tools, the new Converse API from Bedrock use another tool spec input (model-provider agnostic). It has its avantages and drawbacks lets says... for now let's focus on making it work with the current InvokeModel implementation)
Maybe we need a way to force the model to discard the first text answer if there is a tool_use answer right after to only ouput the tool_use reply ? Because having two possible type of answer is a mess to handle from a developper point of view.
Let's say you want your model to do function calling, you should check then if the replies you get are text or tool_use to extract the right JSON, you can't simply take the first answer and call it a day (like the new tool test does, taken directly from Anthropic integration)
@lambda-science please add it as a new test function
@lambda-science please add it as a new test function
Hello, I'm not sure what you meant by this, as I added the test as a new function ?
On another note I think I solved the flacky test. What I did is that when the stop_reason is a tool_use we only return the content that has the tool_use type (effectively discarding the text type content). Else we return chat message as usual.
The flacky test was because when normalizing meta keys, we mutate a dictionnary that was not a deep copy (meta). So if we have a two replies, on the second repliy the dictionary keys are already modified ! My modification solve it in a way that, if it's a tool_use, there will be only one reply.
All tests are passing now
Hello, I'm not sure what you meant by this, as I added the test as a new function ?
sorry, I oversee that you did it
Ah, as possible further improvement for this PR we could/should also add the tool_choice option as well (with test, in the scope of function calling PR improvement). I will work on this in the futur day https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html
Yes @lambda-science add it, your intuition is the best guideline here. I'll take a look tomorrow 🙏
Yes @lambda-science add it, your intuition is the best guideline here. I'll take a look tomorrow 🙏
Everything should be good now, ready for review :) params tools and tool_choice added with an implmentation test (you should run it in the CI/CD, I guess it doesn't by default)
I would be happy to get pinged when release in pypi✌️
@lambda-science ping 🙂