haystack-integrations
haystack-integrations copied to clipboard
create `Argilla` integration for `haystack`nodes
I think it would be cool to be able to monitor Haystack pipelines and nodes directly to Argilla.
https://docs.haystack.deepset.ai/docs/prompts_and_llms could be similar to https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html
However, I can also see support for:
- Does it make sense from your perspective?
- In terms of usage, what should be the prioritization for each one of these supported tasks?
@TuanaCelik, what do you think about this issue?
@davidberenstein1957 this to me sounds like an amazing idea. @masci @anakin87 wdyt?
In terms of usage, I think the most interesting one might be PromptNode
as you mentioned. But I can see it extending to the rest as well
It seems great to me!
I haven't tried Argilla Feedback and Callbacks but I think that by setting up an ArgillaCallbackHandler, you get some "functional" observability:
- graphical monitoring of your LLM activity
- some data to score/annotate if you then want to fine-tune an LLM.
@anakin87 would you see that as it's own component, or would you see it as a functionality of a component. @davidberenstein1957 are you interested in contributing such an integration. Having a package that Haystack users can install and use alongside Haystack to log to Argilla would be amazing
I am interested in working on this. I am not sure if a separate package makes sense or we should install it through extras? But let me know what makes sense for your part of the codebase.
Hey @davidberenstein1957 - we're going to do some investigation with @anakin87 on Monday to get a better understanding how/if the source code should change, or whether this would be possible to implement as a lone standing integration. We will comment back here with our findings ๐
No problem๐
Hey @davidberenstein1957 we've done some thinking and we see 2 options for you. And it's largely your choice of what type of integration you would like to build:
-
The
Agent
in Haystack aldready has callback functionality. Similar to what you did with LangChain, you could make use of this callback and implement an ArgillaCallbackHandler, similar to what Stefano suggested. If you decide to do this, and find out that our callback functionality in Agent isn't enough, we'd be happy to extend it. -
Since the rest of our nodes such as
PromptNode
doesn't have callbacks, you could opt to create a custom node that can go before and after a node of which you would like to record input and output to. You could design this node(s) as something I could add into my pipeline, before and after a node, and it records the data to Argilla.
If you decide to go with option 2, @anakin87 had the idea that it could even be a custom node that can wrap another node like the PromptNode
We're currently desigining Haystack v2, whatever option you prefer to go for, we would be happy to update you if there's any major change that would make your integration better. E.g., if we do end up implementing callbacks for PromptNode etc.
Hi @TuanaCelik @anakin87 ,
I think starting off with the callback functionality would be best in terms of effort vs gain. However, I think an ArgillaNode encompassing a lot of additional tasks would be a more robust solution for the future but we work on this, whenever the first integration proves successful.
What do you think?
Thanks, @davidberenstein1957! That sounds like a good idea to me...
@anakin87 Great! So, from your perspective, should we add the ArgillaCallbackHandler
as an integration or as an addition to the main repo? Within LangChain the integration is managed in the main repository but I am not sure what you prefer.
Sorry! I missed this one...
We would like to have ArgillaCallbackHandler
as an integration.
@anakin87 - correct me if I'm wrong, but I think currently this would be a good integration and if @davidberenstein1957 is interested in it, we can involve you in Haystack v2 discussions when we start looking into evaluation and observability. You can see our current status on the Discussions page.
@TuanaCelik @anakin87, I noticed that you do not directly allow for customization of the callback_manager
.
For me, it would make sense to change the haystack/agents/base.py
.
And introduces a base CallbackManager class and allow for passing a custom implement of this class to the Agent.init()
.
from abc import ABC, abstractmethod
class BaseCallbackManager(ABC):
@abstractmethod
def abstract_func(self, *args, **kwargs):
pass
class CallbackManager(BaseCallbackManager):
def __init__(self, agent_color: Color = Color.GREEN) -> None:
self.agent_color = agent_color
def on_tool_finish(
self,
tool_output: str,
color: Optional[Color] = None,
observation_prefix: Optional[str] = None,
llm_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:
print_text(observation_prefix) # type: ignore
print_text(tool_output, color=color)
print_text(f"\n{llm_prefix}")
def on_new_token(self, token: str, **kwargs: Any) -> None:
print_text(token, color=self.agent_color)
def on_agent_start(self, **kwargs: Any) -> None:
agent_name = kwargs.pop("name", "react")
print_text(f"\nAgent {agent_name} started with {kwargs}\n", color=self.agent_color)
def on_agent_step(self, agent_step, **kwargs: Any) -> None:
print_text(agent_step.prompt_node_response, end="\n", color=self.agent_color)
I, and others, would then be able to do:
class ArgillaCallbackManager(CallbackManager):
def __init__(self, *args, **kwargs):
pass
argilla_callbackmanager = ArgillaCallbackManager()
prompt_node = ...
agent = Agent(
prompt_node=...,
callback_manager=argilla_callbackmanager
)
Let me know if I can pick this up in a PR to the core package.
I also noticed that callback_manager.on_agent_finish()
is being used within Agents
but that it is not being defined as a callable. Not sure what the intended behavior would have been for this but it takes agent_step
as a param.
@davidberenstein1957 your observations make sense to me and I would say that introducing a base CallbackManager
class is reasonable.
@masci @vblagoje any thoughts on this?
@masci @vblagoje
For me, it makes sense to also add a couple of additional callback points and use the same callback manager for both agents and tools from the ToolManager
,i.e., also include on_tool_*
methods to the CallbackManager
. Not sure how fixed you are on the usage of Events
but perhaps we can move CallbackManager to make the function usage more specific and maintainable. Also, this would allow any user to fully customize their Callbacks and re-use base logic where needed.
@masci @vblagoje reminder
@davidberenstein1957 apologies for the late response; I never got these notifications you sent. I understand your intentions regarding the custom Argilla callback manager. In striving for simplicity and prevention of a class explosion, we used the Events class from events Python library for agent callbacks. Is there a way you can adapt to this approach for the 1.x branch, and then for 2.x, we can talk more about trade-offs of a more explicit callback mechanism and which one to use? Let us know your thoughts.
@vblagoje I think this normally would be a problem but the hardcoded event-call are still limited. At the moment it is not possible to get the final answer from the Events because there is no on_agent_finish
(final step) method hardcoded in the Agent
class so it remains difficult to add monitoring options similar to the LangChain integration.
In order to do the 1.x
integration, we would need to mimic the methods that have been defined here, which I can handle in a PR if you want to.
I propose altering this line https://github.com/deepset-ai/haystack/blob/d048bb53523b11426d3c39f7cdc0aa8d15f0cf67/haystack/agents/base.py#L362
That seems reasonable @davidberenstein1957 Would you please follow up on this idea @anakin87 ?
@anakin87 I would like to handle the PR. Do you think it makes sense to also include other steps outlined by langchain
or to just include the on_agent_final_answer
?
@davidberenstein1957 As this part will be rethought and redesigned in Haystack 2.0, I would say open a PR and include the minimum set of events needed for the Argilla implementation. (thanks for your patience :heart_decoration:)
@anakin87 I outlined the PR. Not sure if it requires doc changes or tests? https://github.com/deepset-ai/haystack/pull/5736/files
@anakin87 @TuanaCelik , do you have any pointers w.r.t. this PR? We would love to log some more info into Argilla but maybe the Agent implementation is still limited?
We hope to align further with components/node
and pipeline/chain
monitoring functionality in a follow-up PR :)
Hi, @anakin87 @TuanaCelik, finally we decided to add the integration for Haystack 1.x with Argilla. But we're excited about the Haystack 2.0 possibilities and we plan a refactoring when agents or callbacks become available (I think that it's planned for the upcoming months). So, we are looking forward to the new updates! :)