feat: Add `StringJoiner` as a convenience component
Is your feature request related to a problem? Please describe.
As @ju-gu and I have been building pipelines with many branches that branch into many different Prompt Builders we ran into a need for a component like StringJoiner which would concatenate strings into a list of strings. In our specific use case our workflow looks like (except imagine six branches):
ConditionalRouter --> PromptBuilder 1 --> Generator --> AnswerBuilder --> AnswerJoiner |-> PromptBuilder 2 --> Generator --> AnswerBuilder -|
Using the new StringJoiner we would reduce the number of Generator + AnswerBuilder Components we would need to:
ConditionalRouter --> PromptBuilder 1 --> StringJoiner --> OutputAdpater --> Generator --> AnswerBuilder |-> PromptBuilder 2 -|
Describe the solution you'd like Check with the team that adding this component would be okay. If so I'll go ahead and make a PR.
from typing import Dict, List
from haystack import component, logging
from haystack.core.component.types import Variadic
logger = logging.getLogger(__name__)
@component
class StringJoiner:
"""
Component to join strings from different components to a list of strings
"""
@component.output_types(strings=List[str])
def run(self, strings: Variadic[str]) -> Dict[str, List[str]]:
"""
Joins strings into a list of strings
:param strings: strings from different components
"""
strings = list(strings)
return {"strings": strings}
Hey @silvanocerza I made a draft PR with the component here. However, I noticed in testing that the component doesn't actually work in a Pipeline. Specifically it fails this test
@pytest.mark.integration
def test_with_pipeline(self):
string_1 = "What's Natural Language Processing?"
string_2 = "What's is life?"
pipe = Pipeline()
pipe.add_component("prompt_builder_1", PromptBuilder("Builder 1: {{query}}"))
pipe.add_component("prompt_builder_2", PromptBuilder("Builder 2: {{query}}"))
pipe.add_component("joiner", StringJoiner())
pipe.connect("prompt_builder_1.prompt", "joiner.strings")
pipe.connect("prompt_builder_2.prompt", "joiner.strings")
results = pipe.run(data={"prompt_builder_1": {"query": string_1}, "prompt_builder_2": {"query": string_2}})
assert "joiner" in results
assert len(results["joiner"]["strings"]) == 2 # <-- This Fails. List only has one item in it (the string_2).
assert results["joiner"]["strings"] == [
"Builder 1: What's Natural Language Processing?",
"Builder 2: What's is life?",
]
so I wanted to ask if you happen to know whether the Variadic input type in Haystack doesn't work with type str. So Variadic[str]?
@sjrl not sure about this but looking at the type definition of Variadic it seems it doesn't support str
HAYSTACK_VARIADIC_ANNOTATION = "__haystack__variadic_t"
# # Generic type variable used in the Variadic container
T = TypeVar("T")
# Variadic is a custom annotation type we use to mark input types.
# This type doesn't do anything else than "marking" the contained
# type so it can be used in the `InputSocket` creation where we
# check that its annotation equals to CANALS_VARIADIC_ANNOTATION
Variadic: TypeAlias = Annotated[Iterable[T], HAYSTACK_VARIADIC_ANNOTATION]
@davidsbatista thanks for the info. When I talked to @silvanocerza offline he thought this should work. @silvanocerza have you had a chance to look at this yet?
@sjrl have a look at this sample/testing code:
https://github.com/deepset-ai/haystack/blob/main/haystack/testing/sample_components/joiner.py
I think what you want might already be there
I've just tried the StringJoiner from the link I sent you get the same error as with your code
@sjrl @silvanocerza good news - I think "I found the bug", running the StringJoin on the subgrah branch works, you get the expected output:
{'joiner': {'output': "Builder 1: What's Natural Language Processing? Builder 2: What's is life?"}}
Oh interesting, I didn't realize the subgraph branch also changes how the non-cyclic piplines work as well.