Explore adding support for tool calling in `HuggingFaceAPIChatGenerator` when streaming
Describe the Feature
It would be great to add support for tool calling when running HuggingFaceAPIChatGenerator in streaming mode.
As shown here https://github.com/deepset-ai/haystack/blob/2ccdba3e99024072c69b4752a6478284813dd182/haystack/components/generators/chat/hugging_face_api.py#L411-L412
we only process the generated text here and only store it as text content here https://github.com/deepset-ai/haystack/blob/2ccdba3e99024072c69b4752a6478284813dd182/haystack/components/generators/chat/hugging_face_api.py#L436
whereas we should properly populate the tool_calls param of ChatMessage if a tool call is present.
The underlying HuggingFace streaming chunk dataclass does contain tool call information
@dataclass_with_extra
class ChatCompletionStreamOutputDelta(BaseInferenceType):
role: str
content: Optional[str] = None
tool_call_id: Optional[str] = None
tool_calls: Optional[List[ChatCompletionStreamOutputDeltaToolCall]] = None
Additional context
It looks like _run_streaming would need to be updated to process tool calling streaming chunks.
To Reproduce
from haystack.tools import Tool
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat.hugging_face_api import HuggingFaceAPIChatGenerator
from haystack.components.generators.utils import print_streaming_chunk
def get_weather(city: str) -> str:
"""Get weather information for a city."""
return f"The weather in {city} is Sunny and 22 C"
tool_parameters = {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
tool = Tool(
name="weather",
description="useful to determine the weather in a given location",
parameters=tool_parameters,
function=get_weather,
)
chat_messages = [ChatMessage.from_user("What's the weather like in Paris?")]
generator = HuggingFaceAPIChatGenerator(
api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
api_params={"model": "NousResearch/Hermes-3-Llama-3.1-8B"},
generation_kwargs={"temperature": 0.5},
streaming_callback=print_streaming_chunk,
)
results = generator.run(chat_messages, tools=[tool])
If I am not wrong, this was done because HF API does not support tools + streaming in a stable/reliable way.
https://github.com/deepset-ai/haystack/blob/2ccdba3e99024072c69b4752a6478284813dd182/haystack/components/generators/chat/hugging_face_api.py#L324-L325
Ahh okay good to know. Is there a place we could check to see if that is still the case? They do have the ability to return tool call information in their streaming chunks. And at least looking into their current spec for ChatCompletionStreamOutputDeltaToolCall it looks well specified with dataclasses.
@dataclass_with_extra
class ChatCompletionStreamOutputDeltaToolCall(BaseInferenceType):
function: ChatCompletionStreamOutputFunction
id: str
index: int
type: str
@dataclass_with_extra
class ChatCompletionStreamOutputFunction(BaseInferenceType):
arguments: str
name: Optional[str] = None
Do you remember what about it wasn't stable?
Explained in https://github.com/deepset-ai/haystack-experimental/pull/120#issue-2592418479 (several links available). This information can be outdated.
More detail specifically in this comment https://github.com/deepset-ai/haystack-experimental/pull/120#discussion_r1806334949
The HF API allows this use case, but I made this decision for the following reasons:
- the streaming output to expect when there are tool calls is pratically undocumented
- based on my experiments, I see that depending on the model, the API returns differend end tokens (e.g.
</s>,<|eot_id|>...).- even if some models (Llama 3.1) support producing multiple tool calls in the same interaction, I could not reproduce this with HF API and I could not infer the format to expect.
In short, on the HF API side, it seems to me that there is work to be done in terms of standardization and documentation. We may support tools+streaming in the future, when things are clearer...
Could be worth rechecking if this is still the case