Using anchors and aliases in Pipeline YAML
Is your feature request related to a problem? Please describe. ChatGenerators and ToolInvokers both have a tools parameter and in many pipelines, the tools parameter can be set to the same object: the same tool/list of tools.
Describe the solution you'd like We should investigate if and how anchors and aliases can be used in Haystack pipelines.
- Check if a pipeline yaml with manually added anchors and aliases can be loaded correctly
- Investigate when the YamlDumper makes use of anchors and aliases
- Document the findings, for example extending https://docs.haystack.deepset.ai/docs/serialization
Example without anchors and aliases Simple pipeline with tools based on cookbook.
Pipeline 1
```yaml components: generator: init_parameters: api_base_url: null api_key: env_vars: - OPENAI_API_KEY strict: true type: env_var generation_kwargs: {} max_retries: null model: gpt-4o-mini organization: null streaming_callback: null timeout: null tools: - description: A tool to get the weather function: __main__.dummy_weather name: weather parameters: properties: location: type: string required: - location type: object tools_strict: false type: haystack.components.generators.chat.openai.OpenAIChatGenerator router: init_parameters: custom_filters: {} optional_variables: [] routes: - condition: '{{replies[0].tool_calls | length > 0}}' output: '{{replies}}' output_name: there_are_tool_calls output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] - condition: '{{replies[0].tool_calls | length == 0}}' output: '{{replies}}' output_name: final_replies output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] unsafe: true validate_output_type: false type: haystack.components.routers.conditional_router.ConditionalRouter tool_invoker: init_parameters: convert_result_to_json_string: false raise_on_failure: true tools: - description: A tool to get the weather function: __main__.dummy_weather name: weather parameters: properties: location: type: string required: - location type: object type: haystack.components.tools.tool_invoker.ToolInvoker connections: - receiver: router.replies sender: generator.replies - receiver: tool_invoker.messages sender: router.there_are_tool_calls max_runs_per_component: 100 metadata: {} ```Example with anchors and aliases
Note how *id001 is used for the tools parameter of ToolInvoker instead of redefining what was already used for the tools parameter of the OpenAIChatGenerator under &id001
Pipeline 2
```yaml components: generator: init_parameters: api_base_url: null api_key: env_vars: - OPENAI_API_KEY strict: true type: env_var generation_kwargs: {} max_retries: null model: gpt-4o-mini organization: null streaming_callback: null timeout: null tools: &id001 - description: A tool to get the weather function: __main__.dummy_weather name: weather parameters: properties: location: type: string required: - location type: object tools_strict: false type: haystack.components.generators.chat.openai.OpenAIChatGenerator router: init_parameters: custom_filters: {} optional_variables: [] routes: - condition: '{{replies[0].tool_calls | length > 0}}' output: '{{replies}}' output_name: there_are_tool_calls output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] - condition: '{{replies[0].tool_calls | length == 0}}' output: '{{replies}}' output_name: final_replies output_type: typing.List[haystack.dataclasses.chat_message.ChatMessage] unsafe: true validate_output_type: false type: haystack.components.routers.conditional_router.ConditionalRouter tool_invoker: init_parameters: convert_result_to_json_string: false raise_on_failure: true tools: *id001 type: haystack.components.tools.tool_invoker.ToolInvoker connections: - receiver: router.replies sender: generator.replies - receiver: tool_invoker.messages sender: router.there_are_tool_calls max_runs_per_component: 100 metadata: {} ```
Hi @julian-risch i just tested anchors and it works like a charm:
custom_mapping:
some_resuable_subconfig: &some_resuable_subconfig
init_parameters:
required_variables: null
template:
- _content:
- text: '
Please create a summary about the following topic:
{{ topic }}
'
_meta: {}
_name: null
_role: user
variables: null
type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
components:
builder: *some_resuable_subconfig
llm:
init_parameters:
generation_kwargs:
max_new_tokens: 150
stop_sequences: []
huggingface_pipeline_kwargs:
device: cpu
model: Qwen/Qwen2.5-1.5B-Instruct
task: text-generation
streaming_callback: null
token:
env_vars:
- HF_API_TOKEN
- HF_TOKEN
strict: false
type: env_var
type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator
connections:
- receiver: llm.messages
sender: builder.prompt
max_runs_per_component: 100
metadata: {}
So from yaml => haystack pipeline works, thanks to the yaml package. The question now is how build the yaml from a pipeline in memory and if we would like to have anchors used within the generation (might make sense to first level-predefined classes like document stores or tools)