haystack
haystack copied to clipboard
[2.0] Component names matter - but is this a feature?
Advent of Haystack Day 1 and 2
Describe the bug Giving components names that deviate from the instance variable name causes the pipelines to malfunction. This works
from haystack import Pipeline
pipeline = Pipeline()
pipeline.add_component(name="fetcher", instance=fetcher)
pipeline.add_component(name="converter", instance=converter)
pipeline.add_component(name="splitter", instance=splitter)
pipeline.add_component(name="prompt_builder", instance=prompt_builder)
pipeline.add_component(name="llm", instance=llm)
pipeline.connect("fetcher", "converter")
pipeline.connect("converter","splitter")
pipeline.connect("splitter", "prompt_builder")
pipeline.connect("prompt_builder", "llm")
query_dict ={
"urls": ["https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2"],
"query": "How do you build a custom component?"
}
Assume I give the splitter instance the name "preprocessor"
from haystack import Pipeline
pipeline = Pipeline()
pipeline.add_component(name="fetcher", instance=fetcher)
pipeline.add_component(name="converter", instance=converter)
pipeline.add_component(name="preprocessor", instance=splitter)
pipeline.add_component(name="prompt_builder", instance=prompt_builder)
pipeline.add_component(name="llm", instance=llm)
pipeline.connect("fetcher", "converter")
pipeline.connect("converter","preprocessor")
pipeline.connect("preprocessor", "prompt_builder")
pipeline.connect("prompt_builder", "llm")
query_dict ={
"urls": ["https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2"],
"query": "How do you build a custom component?"
}
This causes the error message below
Error message
ValueError Traceback (most recent call last)
[/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb) Cell 15 line 7
[1](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=0) query_dict ={
[2](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=1) "urls": ["https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2"],
[3](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=2) "query": "How do you build a custom component?"
[4](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=3) }
----> [7](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=6) result = pipeline.run(data={"fetcher": {"urls": query_dict["urls"]}, "prompt_builder": {"query": query_dict["query"]}})
File [~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:85](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:85), in Pipeline.run(self, data, debug)
[83](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:83) is_nested_component_input = all(isinstance(value, dict) for value in data.values())
[84](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:84) if is_nested_component_input:
---> [85](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:85) return self._run_internal(data=data, debug=debug)
[86](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:86) else:
[87](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:87) # flat input, a dict where keys are input names and values are the corresponding values
[88](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:88) # we need to convert it to a nested dictionary of component inputs and then run the pipeline
[89](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:89) # just like in the previous case
[90](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:90) pipeline_inputs, unresolved_inputs = self._prepare_component_input_data(data)
File [~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:111](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:111), in Pipeline._run_internal(self, data, debug)
[100](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:100) """
[101](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:101) Runs the pipeline by invoking the underlying run to initiate the pipeline execution.
[102](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:102)
(...)
[108](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:108) :raises PipelineRuntimeError: if any of the components fail or return unexpected output.
...
- prompt_builder:
- documents: Any
- llm:
- generation_kwargs: Optional[Dict[str, Any]]
Expected behavior I understood the name given was for the purpose of drawing the pipeline - but it seems to cause issues if I change it. If the name is meant to be fixed, then having a variable called name is not needed.
Additional context
Advent of Haystack Day 2
To Reproduce Add and connect components with a name that differs from the instance name.
FAQ Check
- [x] Have you had a look at our new FAQ page?
System:
- OS: MacOS Ventura, 13.5.2, Apple M1 chip
- GPU/CPU:
- Haystack version (commit or version number): 2.0.0b
- DocumentStore: NA
- Reader: NA
- Retriever: NA
@lfunderburk I can't seem to reproduce the issue. This should fail given the steps you provided to reproduce but it doesn't.
from haystack import Pipeline
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
fetcher = LinkContentFetcher()
converter = HTMLToDocument()
pipeline = Pipeline()
pipeline.add_component(name="foo", instance=fetcher)
pipeline.add_component(name="bar", instance=converter)
pipeline.connect("foo", "bar")
I can't seem to pin point the exact issue here either. If you could provide a snippet to reproduce the issue reliably it would be great. Even a Colab is fine.