haystack-integrations create `Argilla` integration for `haystack`nodes

I think it would be cool to be able to monitor Haystack pipelines and nodes directly to Argilla.

https://docs.haystack.deepset.ai/docs/prompts_and_llms could be similar to https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html

However, I can also see support for:

Jul 13 '23 06:07 davidberenstein1957

Does it make sense from your perspective?
In terms of usage, what should be the prioritization for each one of these supported tasks?

Jul 13 '23 06:07 davidberenstein1957

@TuanaCelik, what do you think about this issue?

Jul 14 '23 06:07 davidberenstein1957

@davidberenstein1957 this to me sounds like an amazing idea. @masci @anakin87 wdyt?

In terms of usage, I think the most interesting one might be PromptNode as you mentioned. But I can see it extending to the rest as well

Jul 17 '23 09:07 TuanaCelik

It seems great to me!

I haven't tried Argilla Feedback and Callbacks but I think that by setting up an ArgillaCallbackHandler, you get some "functional" observability:

graphical monitoring of your LLM activity
some data to score/annotate if you then want to fine-tune an LLM.

Jul 17 '23 09:07 anakin87

@anakin87 would you see that as it's own component, or would you see it as a functionality of a component. @davidberenstein1957 are you interested in contributing such an integration. Having a package that Haystack users can install and use alongside Haystack to log to Argilla would be amazing

Jul 17 '23 13:07 TuanaCelik

I am interested in working on this. I am not sure if a separate package makes sense or we should install it through extras? But let me know what makes sense for your part of the codebase.

Jul 19 '23 10:07 davidberenstein1957

Hey @davidberenstein1957 - we're going to do some investigation with @anakin87 on Monday to get a better understanding how/if the source code should change, or whether this would be possible to implement as a lone standing integration. We will comment back here with our findings 👍

Jul 20 '23 12:07 TuanaCelik

No problem👍

Jul 21 '23 11:07 davidberenstein1957

Hey @davidberenstein1957 we've done some thinking and we see 2 options for you. And it's largely your choice of what type of integration you would like to build:

The Agent in Haystack aldready has callback functionality. Similar to what you did with LangChain, you could make use of this callback and implement an ArgillaCallbackHandler, similar to what Stefano suggested. If you decide to do this, and find out that our callback functionality in Agent isn't enough, we'd be happy to extend it.
Since the rest of our nodes such as PromptNode doesn't have callbacks, you could opt to create a custom node that can go before and after a node of which you would like to record input and output to. You could design this node(s) as something I could add into my pipeline, before and after a node, and it records the data to Argilla.

If you decide to go with option 2, @anakin87 had the idea that it could even be a custom node that can wrap another node like the PromptNode

We're currently desigining Haystack v2, whatever option you prefer to go for, we would be happy to update you if there's any major change that would make your integration better. E.g., if we do end up implementing callbacks for PromptNode etc.

Jul 25 '23 10:07 TuanaCelik

Hi @TuanaCelik @anakin87 ,

I think starting off with the callback functionality would be best in terms of effort vs gain. However, I think an ArgillaNode encompassing a lot of additional tasks would be a more robust solution for the future but we work on this, whenever the first integration proves successful.

What do you think?

Aug 12 '23 07:08 davidberenstein1957

Thanks, @davidberenstein1957! That sounds like a good idea to me...

Aug 14 '23 09:08 anakin87

@anakin87 Great! So, from your perspective, should we add the ArgillaCallbackHandler as an integration or as an addition to the main repo? Within LangChain the integration is managed in the main repository but I am not sure what you prefer.

Aug 14 '23 09:08 davidberenstein1957

Sorry! I missed this one... We would like to have ArgillaCallbackHandler as an integration.

Aug 17 '23 13:08 anakin87

@anakin87 - correct me if I'm wrong, but I think currently this would be a good integration and if @davidberenstein1957 is interested in it, we can involve you in Haystack v2 discussions when we start looking into evaluation and observability. You can see our current status on the Discussions page.

Aug 17 '23 13:08 TuanaCelik

@TuanaCelik @anakin87, I noticed that you do not directly allow for customization of the callback_manager.

For me, it would make sense to change the haystack/agents/base.py.

And introduces a base CallbackManager class and allow for passing a custom implement of this class to the Agent.init().

from abc import ABC, abstractmethod

class BaseCallbackManager(ABC):
    @abstractmethod
    def abstract_func(self, *args, **kwargs):
        pass

class CallbackManager(BaseCallbackManager):
    def __init__(self, agent_color: Color = Color.GREEN) -> None:
        self.agent_color = agent_color

    def on_tool_finish(
        self,
        tool_output: str,
        color: Optional[Color] = None,
        observation_prefix: Optional[str] = None,
        llm_prefix: Optional[str] = None,
        **kwargs: Any,
    ) -> None:
        print_text(observation_prefix)  # type: ignore
        print_text(tool_output, color=color)
        print_text(f"\n{llm_prefix}")

    def on_new_token(self, token: str, **kwargs: Any) -> None:
        print_text(token, color=self.agent_color)

    def on_agent_start(self, **kwargs: Any) -> None:
        agent_name = kwargs.pop("name", "react")
        print_text(f"\nAgent {agent_name} started with {kwargs}\n", color=self.agent_color)

    def on_agent_step(self, agent_step, **kwargs: Any) -> None:
        print_text(agent_step.prompt_node_response, end="\n", color=self.agent_color)

I, and others, would then be able to do:

class ArgillaCallbackManager(CallbackManager):
    def __init__(self, *args, **kwargs):
        pass

argilla_callbackmanager = ArgillaCallbackManager()

prompt_node = ...
agent = Agent(
    prompt_node=...,
    callback_manager=argilla_callbackmanager
)

Let me know if I can pick this up in a PR to the core package.

Aug 21 '23 07:08 davidberenstein1957

I also noticed that callback_manager.on_agent_finish() is being used within Agents but that it is not being defined as a callable. Not sure what the intended behavior would have been for this but it takes agent_step as a param.

Aug 21 '23 07:08 davidberenstein1957

@davidberenstein1957 your observations make sense to me and I would say that introducing a base CallbackManager class is reasonable. @masci @vblagoje any thoughts on this?

Aug 21 '23 09:08 anakin87

@masci @vblagoje

For me, it makes sense to also add a couple of additional callback points and use the same callback manager for both agents and tools from the ToolManager,i.e., also include on_tool_* methods to the CallbackManager. Not sure how fixed you are on the usage of Events but perhaps we can move CallbackManager to make the function usage more specific and maintainable. Also, this would allow any user to fully customize their Callbacks and re-use base logic where needed.

Aug 22 '23 07:08 davidberenstein1957

@masci @vblagoje reminder

Aug 29 '23 07:08 davidberenstein1957

@davidberenstein1957 apologies for the late response; I never got these notifications you sent. I understand your intentions regarding the custom Argilla callback manager. In striving for simplicity and prevention of a class explosion, we used the Events class from events Python library for agent callbacks. Is there a way you can adapt to this approach for the 1.x branch, and then for 2.x, we can talk more about trade-offs of a more explicit callback mechanism and which one to use? Let us know your thoughts.

Sep 04 '23 09:09 vblagoje

@vblagoje I think this normally would be a problem but the hardcoded event-call are still limited. At the moment it is not possible to get the final answer from the Events because there is no on_agent_finish(final step) method hardcoded in the Agent class so it remains difficult to add monitoring options similar to the LangChain integration.

In order to do the 1.x integration, we would need to mimic the methods that have been defined here, which I can handle in a PR if you want to.

I propose altering this line https://github.com/deepset-ai/haystack/blob/d048bb53523b11426d3c39f7cdc0aa8d15f0cf67/haystack/agents/base.py#L362

Sep 06 '23 13:09 davidberenstein1957

That seems reasonable @davidberenstein1957 Would you please follow up on this idea @anakin87 ?

Sep 06 '23 21:09 vblagoje

@anakin87 I would like to handle the PR. Do you think it makes sense to also include other steps outlined by langchain or to just include the on_agent_final_answer?

Sep 07 '23 07:09 davidberenstein1957

@davidberenstein1957 As this part will be rethought and redesigned in Haystack 2.0, I would say open a PR and include the minimum set of events needed for the Argilla implementation. (thanks for your patience :heart_decoration:)

Sep 07 '23 08:09 anakin87

@anakin87 I outlined the PR. Not sure if it requires doc changes or tests? https://github.com/deepset-ai/haystack/pull/5736/files

Sep 07 '23 09:09 davidberenstein1957

@anakin87 @TuanaCelik , do you have any pointers w.r.t. this PR? We would love to log some more info into Argilla but maybe the Agent implementation is still limited?

We hope to align further with components/node and pipeline/chain monitoring functionality in a follow-up PR :)

Jan 15 '24 17:01 davidberenstein1957

Hi, @anakin87 @TuanaCelik, finally we decided to add the integration for Haystack 1.x with Argilla. But we're excited about the Haystack 2.0 possibilities and we plan a refactoring when agents or callbacks become available (I think that it's planned for the upcoming months). So, we are looking forward to the new updates! :)

Feb 21 '24 11:02 sdiazlor

haystack-integrations haystack-integrations copied to clipboard

create `Argilla` integration for `haystack`nodes

haystack-integrations
haystack-integrations copied to clipboard