haystack icon indicating copy to clipboard operation
haystack copied to clipboard

feat: Add `StringJoiner` as a convenience component

Open sjrl opened this issue 1 year ago • 7 comments

Is your feature request related to a problem? Please describe. As @ju-gu and I have been building pipelines with many branches that branch into many different Prompt Builders we ran into a need for a component like StringJoiner which would concatenate strings into a list of strings. In our specific use case our workflow looks like (except imagine six branches):

ConditionalRouter --> PromptBuilder 1 --> Generator --> AnswerBuilder --> AnswerJoiner |-> PromptBuilder 2 --> Generator --> AnswerBuilder -|

Using the new StringJoiner we would reduce the number of Generator + AnswerBuilder Components we would need to:

ConditionalRouter --> PromptBuilder 1 --> StringJoiner --> OutputAdpater --> Generator --> AnswerBuilder |-> PromptBuilder 2 -|

Describe the solution you'd like Check with the team that adding this component would be okay. If so I'll go ahead and make a PR.

from typing import Dict, List

from haystack import component, logging
from haystack.core.component.types import Variadic

logger = logging.getLogger(__name__)


@component
class StringJoiner:
    """
    Component to join strings from different components to a list of strings
    """
    @component.output_types(strings=List[str])
    def run(self, strings: Variadic[str]) -> Dict[str, List[str]]:
        """
        Joins strings into a list of strings
        :param strings: strings from different components

        """
        strings = list(strings)
        return {"strings": strings}

sjrl avatar Sep 11 '24 13:09 sjrl

Hey @silvanocerza I made a draft PR with the component here. However, I noticed in testing that the component doesn't actually work in a Pipeline. Specifically it fails this test

    @pytest.mark.integration
    def test_with_pipeline(self):
        string_1 = "What's Natural Language Processing?"
        string_2 = "What's is life?"

        pipe = Pipeline()
        pipe.add_component("prompt_builder_1", PromptBuilder("Builder 1: {{query}}"))
        pipe.add_component("prompt_builder_2", PromptBuilder("Builder 2: {{query}}"))
        pipe.add_component("joiner", StringJoiner())

        pipe.connect("prompt_builder_1.prompt", "joiner.strings")
        pipe.connect("prompt_builder_2.prompt", "joiner.strings")

        results = pipe.run(data={"prompt_builder_1": {"query": string_1}, "prompt_builder_2": {"query": string_2}})

        assert "joiner" in results
        assert len(results["joiner"]["strings"]) == 2  # <-- This Fails. List only has one item in it (the string_2).
        assert results["joiner"]["strings"] == [
            "Builder 1: What's Natural Language Processing?",
            "Builder 2: What's is life?",
        ]

so I wanted to ask if you happen to know whether the Variadic input type in Haystack doesn't work with type str. So Variadic[str]?

sjrl avatar Sep 12 '24 08:09 sjrl

@sjrl not sure about this but looking at the type definition of Variadic it seems it doesn't support str

here

HAYSTACK_VARIADIC_ANNOTATION = "__haystack__variadic_t"

# # Generic type variable used in the Variadic container
T = TypeVar("T")


# Variadic is a custom annotation type we use to mark input types.
# This type doesn't do anything else than "marking" the contained
# type so it can be used in the `InputSocket` creation where we
# check that its annotation equals to CANALS_VARIADIC_ANNOTATION
Variadic: TypeAlias = Annotated[Iterable[T], HAYSTACK_VARIADIC_ANNOTATION]

davidsbatista avatar Sep 19 '24 08:09 davidsbatista

@davidsbatista thanks for the info. When I talked to @silvanocerza offline he thought this should work. @silvanocerza have you had a chance to look at this yet?

sjrl avatar Sep 19 '24 09:09 sjrl

@sjrl have a look at this sample/testing code:

https://github.com/deepset-ai/haystack/blob/main/haystack/testing/sample_components/joiner.py

I think what you want might already be there

davidsbatista avatar Sep 19 '24 09:09 davidsbatista

I've just tried the StringJoiner from the link I sent you get the same error as with your code

davidsbatista avatar Sep 19 '24 09:09 davidsbatista

@sjrl @silvanocerza good news - I think "I found the bug", running the StringJoin on the subgrah branch works, you get the expected output:

{'joiner': {'output': "Builder 1: What's Natural Language Processing? Builder 2: What's is life?"}}

davidsbatista avatar Sep 23 '24 13:09 davidsbatista

Oh interesting, I didn't realize the subgraph branch also changes how the non-cyclic piplines work as well.

sjrl avatar Sep 23 '24 13:09 sjrl