haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Explore adding support for tool calling in `HuggingFaceAPIChatGenerator` when streaming

Open sjrl opened this issue 7 months ago • 4 comments

Describe the Feature It would be great to add support for tool calling when running HuggingFaceAPIChatGenerator in streaming mode.

As shown here https://github.com/deepset-ai/haystack/blob/2ccdba3e99024072c69b4752a6478284813dd182/haystack/components/generators/chat/hugging_face_api.py#L411-L412

we only process the generated text here and only store it as text content here https://github.com/deepset-ai/haystack/blob/2ccdba3e99024072c69b4752a6478284813dd182/haystack/components/generators/chat/hugging_face_api.py#L436

whereas we should properly populate the tool_calls param of ChatMessage if a tool call is present.

The underlying HuggingFace streaming chunk dataclass does contain tool call information

@dataclass_with_extra
class ChatCompletionStreamOutputDelta(BaseInferenceType):
    role: str
    content: Optional[str] = None
    tool_call_id: Optional[str] = None
    tool_calls: Optional[List[ChatCompletionStreamOutputDeltaToolCall]] = None

Additional context It looks like _run_streaming would need to be updated to process tool calling streaming chunks.

To Reproduce

from haystack.tools import Tool
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat.hugging_face_api import HuggingFaceAPIChatGenerator
from haystack.components.generators.utils import print_streaming_chunk

def get_weather(city: str) -> str:
    """Get weather information for a city."""
    return f"The weather in {city} is Sunny and 22 C"

tool_parameters = {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
tool = Tool(
    name="weather",
    description="useful to determine the weather in a given location",
    parameters=tool_parameters,
    function=get_weather,
)

chat_messages = [ChatMessage.from_user("What's the weather like in Paris?")]
generator = HuggingFaceAPIChatGenerator(
    api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
    api_params={"model": "NousResearch/Hermes-3-Llama-3.1-8B"},
    generation_kwargs={"temperature": 0.5},
    streaming_callback=print_streaming_chunk,
)
results = generator.run(chat_messages, tools=[tool])

sjrl avatar May 12 '25 06:05 sjrl

If I am not wrong, this was done because HF API does not support tools + streaming in a stable/reliable way.

https://github.com/deepset-ai/haystack/blob/2ccdba3e99024072c69b4752a6478284813dd182/haystack/components/generators/chat/hugging_face_api.py#L324-L325

anakin87 avatar May 12 '25 07:05 anakin87

Ahh okay good to know. Is there a place we could check to see if that is still the case? They do have the ability to return tool call information in their streaming chunks. And at least looking into their current spec for ChatCompletionStreamOutputDeltaToolCall it looks well specified with dataclasses.

@dataclass_with_extra
class ChatCompletionStreamOutputDeltaToolCall(BaseInferenceType):
    function: ChatCompletionStreamOutputFunction
    id: str
    index: int
    type: str
@dataclass_with_extra
class ChatCompletionStreamOutputFunction(BaseInferenceType):
    arguments: str
    name: Optional[str] = None

Do you remember what about it wasn't stable?

sjrl avatar May 12 '25 07:05 sjrl

Explained in https://github.com/deepset-ai/haystack-experimental/pull/120#issue-2592418479 (several links available). This information can be outdated.

anakin87 avatar May 12 '25 07:05 anakin87

More detail specifically in this comment https://github.com/deepset-ai/haystack-experimental/pull/120#discussion_r1806334949

The HF API allows this use case, but I made this decision for the following reasons:

  • the streaming output to expect when there are tool calls is pratically undocumented
  • based on my experiments, I see that depending on the model, the API returns differend end tokens (e.g. </s>, <|eot_id|>...).
  • even if some models (Llama 3.1) support producing multiple tool calls in the same interaction, I could not reproduce this with HF API and I could not infer the format to expect.

In short, on the HF API side, it seems to me that there is work to be done in terms of standardization and documentation. We may support tools+streaming in the future, when things are clearer...

Could be worth rechecking if this is still the case

sjrl avatar May 12 '25 07:05 sjrl