haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

feat: Add GitHub integration with agent_prompts and github_components

Open julian-risch opened this issue 8 months ago • 4 comments

Related Issues

  • Related to #1650

  • Related to https://github.com/deepset-ai/haystack-experimental/issues/250 @sjrl brought up that one way to keep example files from the experimental Agent around is an integration https://github.com/deepset-ai/haystack-experimental/pull/263#issuecomment-2775755734

Proposed Changes:

  • Move github_components from experimental to a new integration
  • Move agent_prompts from experimental to a new integration

The idea is to enable users to run the example notebook (or a version with updated imports) after having installed this new integration

How did you test it?

New unit tests and I ran all usage examples successfully with a test repo.

I haven't tested it with the notebook yet, which we would need to update first. (tracked by https://github.com/deepset-ai/haystack-cookbook/pull/183 )

Notes for the reviewer

  • I suggest we rename github_token parameter to api_key for consistency with many other integrations.
  • While we could find a way to set up integration tests, I would rather leave them out of this PR.
  • GithubRepositoryViewer has a branch parameter in the run method, which could also be named ref to make more clear it can also be a tag or commit hash. I prefer keeping the parameter name branch.
  • ~~Some components have github_token: Optional[Secret] = None, because they can work without any token while others use Secret.from_env_var("GITHUB_TOKEN"). I suggest we use Secret.from_env_var("GITHUB_TOKEN", strict=False) where we currently have None as the default.~~
  • ~~The internal implementation of the components differs in how they use _get_headers or _get_request_headers or define headers inline. We could refactor that.~~

Checklist

julian-risch avatar Apr 10 '25 12:04 julian-risch

@julian-risch maybe a general comment on the structure here. I see that the prompts aren't being used within the library and I understand they will be used in a future tutorial/colab.

I wonder then if it would be helpful to instead pre-assemble the tools within the repo so users could easily import the tools and immediately pass them to an Agent. What do you think?

sjrl avatar Apr 29 '25 10:04 sjrl

@julian-risch overall this looks really good! I mostly have minor comments and only one larger conceptual one about maybe providing users Tools directly instead of needing to compose them, themselves.

I didn't comb through every line since there is a lot, but it's well tested so it's good to go from my perspective! We can always make quick updates to this if things arise and depending on usage.

sjrl avatar Apr 29 '25 12:04 sjrl

@sjrl I added a test called test_pipeline_serialization, added _get_request_headers to all components and here is the example notebook with updated code up until the GitHub token is required. I commented out "message": {"source": "documents", "handler": message_handler}, because it didn't work for me and need to ask @mathislucka for advice.

https://colab.research.google.com/drive/1ktlwQ-CDLGDs2uZXvzgG8XspfjPidYqZ?usp=sharing

@sjrl If GitHubFileEditorTool looks good to you, I will add tools for all other components and probably update the directory structure a bit.

julian-risch avatar May 06 '25 07:05 julian-risch

@sjrl I added a test called test_pipeline_serialization, added _get_request_headers to all components and here is the example notebook with updated code up until the GitHub token is required. I commented out "message": {"source": "documents", "handler": message_handler}, because it didn't work for me and need to ask @mathislucka for advice.

@julian-risch This is related to the change we made to tools to have a new variable called outputs_to_string. So the google colab code should be updated to

    ...
    outputs_to_state={
        #"message": {"source": "documents", "handler": message_handler}, TODO
        "documents": {"source": "documents"},
    },
    outputs_to_string={"source": "documents", "handler": message_handler}
    ...

sjrl avatar May 06 '25 08:05 sjrl

@sjrl Finally ready for another review! We're using "data" instead of "init_parameters" in serialization now and all newly implemented tools expose outputs_to_string, inputs_from_state, and outputs_to_state as init parameters. I tested this by running the updated notebook. The Agent forked the repo and committed to a branch.

What do you think about the parameter name github_token? In almost every other place, we use api_key, so for consistency with the many other integrations, we could rename github_token to api_key or leave it as is.

julian-risch avatar May 27 '25 11:05 julian-risch