haystack icon indicating copy to clipboard operation
haystack copied to clipboard

ConditionalRouter output type is not working anymore

Open gustavo-has-stone opened this issue 1 year ago • 13 comments

Describe the bug We tried to update the Haystack version to 2.3.1 due to the security update and our code stopped working. We identified that the ConditionalRouter is returning a string instead of the type assigned in output_type

Error message We are getting errors related to the fact that a string is now being returned. Here is an example of the error when running the provided code:

AttributeError: 'str' object has no attribute 'content'

Expected behavior We expected that the output_type provided in the route would be returned

Additional context The problem is related to version 2.3.1. Our provided example works on 2.3.0

To Reproduce The following code works on haystack 2.3.0 but it does not work on haystack 2.3.1:

from haystack.components.routers import ConditionalRouter
from haystack.dataclasses import ChatMessage, ChatRole

routes = [
    {
        "condition": "{{streams|length > 2}}",
        "output": "{{message}}",
        "output_name": "long_streams_message",
        "output_type": ChatMessage,
    },
    {
        "condition": "{{streams|length <= 2}}",
        "output": "{{message}}",
        "output_name": "short_streams_message",
        "output_type": ChatMessage,
    },
]

message = ChatMessage(content='Hi', role=ChatRole('user'), name='', meta={})
print(message.content)

router = ConditionalRouter(routes)

kwargs = {"streams": [1, 2, 3], "message": message}
result = router.run(**kwargs)
print(result['long_streams_message'].content)

FAQ Check

System:

  • OS: Ubuntu 22.04 LTS
  • GPU/CPU: CPU
  • Haystack version (commit or version number): 2.3.1

gustavo-has-stone avatar Aug 02 '24 14:08 gustavo-has-stone

That's expected.

Because of the changes to fix the security issue we had to make this breaking change. The ConditionalRouter can only return the following Python literal types now: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis. Any other will be returned as a string.

I would suggest creating a custom Component to cover those cases where the ConditionalRouter can't be used as before.

Something like this should do the trick:

from typing import List

from haystack import component
from haystack.dataclasses import ChatMessage

@component
class Decision:
    @component.output_types(short_streams_message=ChatMessage, long_streams_message=ChatMessage)
    def run(self, streams: List[str], message: ChatMessage):
        if len(streams) > 2:
            return {"long_streams_message": message}
        return {"short_streams_message": message}

silvanocerza avatar Aug 02 '24 14:08 silvanocerza

I see. So the parameter output_type will not work anymore?

We did not expect to see a breaking change with a patch version change. Should we worry about those updates too?

gustavo-has-stone avatar Aug 02 '24 14:08 gustavo-has-stone

output_type has never been necessary to convert the type, it's only used to set the correct output type for the Component. Even before this change you could put any type in there but it wouldn't guarantee that the Component will return that type when calling run. That's only necessary when calling Pipeline.connect() to validate the connection.

We decided to release this security change as a patch release even if it's a breaking change as it's a pretty critical, it could lead to remote code execution.

silvanocerza avatar Aug 02 '24 14:08 silvanocerza

Perhaps we could add an option to the ConditionalRouter so that it can route inputs to the selected route's output without rendering a jinja2 template?

I also had this case, where I wanted to route a list of Documents depending on a reply generated from the LLM. I don't have to go through jinja to route these documents because I don't want to change anything about them.

So something like:

routes = [
{"condition": "{{...}}", output: "documents", output_name: "a_documents", "output_type": List[Document], "use_literal_output": True}
]
router = ConditionalRouter(routes=routes)

router.run(documents=[Document(content="hey")])

# returns the document passed in unchanged

https://github.com/deepset-ai/haystack/blob/e17d0c41926849975166e70ec46411f402c8098e/haystack/components/routers/conditional_router.py#L206

This part would need to be changed to something like:

if route.get("use_literal_output"):
    output = kwargs[route["output"]]
else:
    ...

Would that solve your use case too @gustavo-has-stone ?

mathislucka avatar Aug 03 '24 13:08 mathislucka

In our scenario, we were routing a variable of a custom type (similar to ChatMessage) based on the value of an attribute of this variable. I did not fully understand your example. Could you provide more details?

gustavo-has-stone avatar Aug 05 '24 21:08 gustavo-has-stone

Hi, I ran into this problem too when upgrading to 2.3.1. I use a custom dataclass type for my pipeline which isn't working as expected anymore. Is there a suggested fix for this without creating a custom router?

alexsniffin avatar Aug 06 '24 15:08 alexsniffin

@gustavo-has-stone @alexsniffin

We discussed a bit internally and decided to give the possibility to enable the previous behaviour by creating ConditionalRouter and OutputAdapter with unsafe=True.

I just created a PR to bring that change in and we'll release it on the upcoming 2.4.0 some time next week. For the time being, and if you consider it safe, you can still rely on the previous behaviour by using 2.3.0 instead of the latest 2.3.1.

silvanocerza avatar Aug 08 '24 14:08 silvanocerza

I don't think that using an unsafe version is a good solution for us. We will try to find an alternative solution for our use case

gustavo-has-stone avatar Aug 08 '24 15:08 gustavo-has-stone

Hey @silvanocerza Idk if it`s a good solution because you are publishing a version that the only way to use previews behavior is forcing unsafe. It forces people to open security breaches when need a feature.

An approach that could help us is using Haystack ChatMessage as Router output. Do you know if is it possible?

dmvieira avatar Aug 08 '24 15:08 dmvieira

I don't think that using an unsafe version is a good solution for us. We will try to find an alternative solution for our use case

It's unsafe only if the Jinja templates used are users inputs too. If you know the templates it's completely safe to use.

Hey @silvanocerza Idk if it`s a good solution because you are publishing a version that the only way to use previews behavior is forcing unsafe. It forces people to open security breaches when need a feature.

An approach that could help us is using Haystack ChatMessage as Router output. Do you know if is it possible?

Setting unsafe to True doesn't automatically open you to security breaches. It enables a behaviour that is considered unsafe in certain circumstances, that is when ConditionalRouter or OutputAdapter use templates that can be considered users input.

If you know what you're doing and know you're not using users inputs as Jinja template that is completely fine to use.

silvanocerza avatar Aug 09 '24 09:08 silvanocerza

Let me see if I understand. If we try to route based on some controlled condition (e.g., message length), we will be fine. However, if we try to use the content of a message for conditional routing, we could be in trouble. Is that right?

gustavo-has-stone avatar Aug 09 '24 15:08 gustavo-has-stone

Let me see if I understand. If we try to route based on some controlled condition (e.g., message length), we will be fine. However, if we try to use the content of a message for conditional routing, we could be in trouble. Is that right?

No. You are vulnerable if you let your users write their own templates from scratch.

A simple example:

from typing import List
from haystack.components.routers import ConditionalRouter

user_input = input("Write a Jinja template:")

routes = [
    {
        "condition": user_input,
        "output": "{{streams}}",
        "output_name": "enough_streams",
        "output_type": List[int],
    },
    {
        "condition": "{{streams|length <= 2}}",
        "output": "{{streams}}",
        "output_name": "insufficient_streams",
        "output_type": List[int],
    },
]
router = ConditionalRouter(routes)
...
router.run(whatever_input)

silvanocerza avatar Aug 12 '24 12:08 silvanocerza

Thank you @silvanocerza . It sounds like a plan! 😄

dmvieira avatar Aug 12 '24 14:08 dmvieira