Using anchors and aliases in Pipeline YAML

Open julian-risch opened this issue 11 months ago • 1 comments

Is your feature request related to a problem? Please describe. ChatGenerators and ToolInvokers both have a tools parameter and in many pipelines, the tools parameter can be set to the same object: the same tool/list of tools.

Describe the solution you'd like We should investigate if and how anchors and aliases can be used in Haystack pipelines.

Check if a pipeline yaml with manually added anchors and aliases can be loaded correctly
Investigate when the YamlDumper makes use of anchors and aliases
Document the findings, for example extending https://docs.haystack.deepset.ai/docs/serialization

Example without anchors and aliases Simple pipeline with tools based on cookbook.

Pipeline 1

```yaml components: generator: init_parameters: api_base_url: null api_key: env_vars: - OPENAI_API_KEY strict: true type: env_var generation_kwargs: {} max_retries: null model: gpt-4o-mini organization: null streaming_callback: null timeout: null tools: - description: A tool to get the weather function: __main__.dummy_weather name: weather parameters: properties: location: type: string required: - location type: object tools_strict: false type: haystack.components.generators.chat.openai.OpenAIChatGenerator router: init_parameters: custom_filters: {} optional_variables: [] routes: - condition: '{{replies[0].tool_calls | length > 0}}' output: '{{replies}}' output_name: there_are_tool_calls output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] - condition: '{{replies[0].tool_calls | length == 0}}' output: '{{replies}}' output_name: final_replies output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] unsafe: true validate_output_type: false type: haystack.components.routers.conditional_router.ConditionalRouter tool_invoker: init_parameters: convert_result_to_json_string: false raise_on_failure: true tools: - description: A tool to get the weather function: __main__.dummy_weather name: weather parameters: properties: location: type: string required: - location type: object type: haystack.components.tools.tool_invoker.ToolInvoker connections: - receiver: router.replies sender: generator.replies - receiver: tool_invoker.messages sender: router.there_are_tool_calls max_runs_per_component: 100 metadata: {} ```

Example with anchors and aliases Note how *id001 is used for the tools parameter of ToolInvoker instead of redefining what was already used for the tools parameter of the OpenAIChatGenerator under &id001

Pipeline 2

```yaml components: generator: init_parameters: api_base_url: null api_key: env_vars: - OPENAI_API_KEY strict: true type: env_var generation_kwargs: {} max_retries: null model: gpt-4o-mini organization: null streaming_callback: null timeout: null tools: &id001 - description: A tool to get the weather function: __main__.dummy_weather name: weather parameters: properties: location: type: string required: - location type: object tools_strict: false type: haystack.components.generators.chat.openai.OpenAIChatGenerator router: init_parameters: custom_filters: {} optional_variables: [] routes: - condition: '{{replies[0].tool_calls | length > 0}}' output: '{{replies}}' output_name: there_are_tool_calls output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] - condition: '{{replies[0].tool_calls | length == 0}}' output: '{{replies}}' output_name: final_replies output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] unsafe: true validate_output_type: false type: haystack.components.routers.conditional_router.ConditionalRouter tool_invoker: init_parameters: convert_result_to_json_string: false raise_on_failure: true tools: *id001 type: haystack.components.tools.tool_invoker.ToolInvoker connections: - receiver: router.replies sender: generator.replies - receiver: tool_invoker.messages sender: router.there_are_tool_calls max_runs_per_component: 100 metadata: {} ```

Jan 09 '25 09:01 julian-risch

Hi @julian-risch i just tested anchors and it works like a charm:

custom_mapping:
  some_resuable_subconfig: &some_resuable_subconfig
    init_parameters:
      required_variables: null
      template:
        - _content:
            - text: '

              Please create a summary about the following topic:

              {{ topic }}

              '
          _meta: {}
          _name: null
          _role: user
      variables: null
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder

components:
  builder: *some_resuable_subconfig
  llm:
    init_parameters:
      generation_kwargs:
        max_new_tokens: 150
        stop_sequences: []
      huggingface_pipeline_kwargs:
        device: cpu
        model: Qwen/Qwen2.5-1.5B-Instruct
        task: text-generation
      streaming_callback: null
      token:
        env_vars:
          - HF_API_TOKEN
          - HF_TOKEN
        strict: false
        type: env_var
    type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator

connections:
  - receiver: llm.messages
    sender: builder.prompt

max_runs_per_component: 100
metadata: {}

So from yaml => haystack pipeline works, thanks to the yaml package. The question now is how build the yaml from a pipeline in memory and if we would like to have anchors used within the generation (might make sense to first level-predefined classes like document stores or tools)

Mar 21 '25 13:03 ArzelaAscoIi