feat: Add GitHub integration with agent_prompts and github_components
Related Issues
-
Related to #1650
-
Related to https://github.com/deepset-ai/haystack-experimental/issues/250 @sjrl brought up that one way to keep example files from the experimental Agent around is an integration https://github.com/deepset-ai/haystack-experimental/pull/263#issuecomment-2775755734
Proposed Changes:
- Move github_components from experimental to a new integration
- Move agent_prompts from experimental to a new integration
The idea is to enable users to run the example notebook (or a version with updated imports) after having installed this new integration
How did you test it?
New unit tests and I ran all usage examples successfully with a test repo.
I haven't tested it with the notebook yet, which we would need to update first. (tracked by https://github.com/deepset-ai/haystack-cookbook/pull/183 )
Notes for the reviewer
- I suggest we rename
github_tokenparameter toapi_keyfor consistency with many other integrations. - While we could find a way to set up integration tests, I would rather leave them out of this PR.
- GithubRepositoryViewer has a
branchparameter in the run method, which could also be named ref to make more clear it can also be a tag or commit hash. I prefer keeping the parameter namebranch. - ~~Some components have
github_token: Optional[Secret] = None,because they can work without any token while others useSecret.from_env_var("GITHUB_TOKEN"). I suggest we useSecret.from_env_var("GITHUB_TOKEN", strict=False)where we currently haveNoneas the default.~~ - ~~The internal implementation of the components differs in how they use
_get_headersor_get_request_headersor define headers inline. We could refactor that.~~
Checklist
- I have read the contributors guidelines and the code of conduct
- I have updated the related issue with new insights and changes
- I added unit tests and updated the docstrings
- I've used one of the conventional commit types for my PR title:
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.
@julian-risch maybe a general comment on the structure here. I see that the prompts aren't being used within the library and I understand they will be used in a future tutorial/colab.
I wonder then if it would be helpful to instead pre-assemble the tools within the repo so users could easily import the tools and immediately pass them to an Agent. What do you think?
@julian-risch overall this looks really good! I mostly have minor comments and only one larger conceptual one about maybe providing users Tools directly instead of needing to compose them, themselves.
I didn't comb through every line since there is a lot, but it's well tested so it's good to go from my perspective! We can always make quick updates to this if things arise and depending on usage.
@sjrl I added a test called test_pipeline_serialization, added _get_request_headers to all components and here is the example notebook with updated code up until the GitHub token is required. I commented out "message": {"source": "documents", "handler": message_handler}, because it didn't work for me and need to ask @mathislucka for advice.
https://colab.research.google.com/drive/1ktlwQ-CDLGDs2uZXvzgG8XspfjPidYqZ?usp=sharing
@sjrl If GitHubFileEditorTool looks good to you, I will add tools for all other components and probably update the directory structure a bit.
@sjrl I added a test called
test_pipeline_serialization, added_get_request_headersto all components and here is the example notebook with updated code up until the GitHub token is required. I commented out"message": {"source": "documents", "handler": message_handler},because it didn't work for me and need to ask @mathislucka for advice.
@julian-risch This is related to the change we made to tools to have a new variable called outputs_to_string. So the google colab code should be updated to
...
outputs_to_state={
#"message": {"source": "documents", "handler": message_handler}, TODO
"documents": {"source": "documents"},
},
outputs_to_string={"source": "documents", "handler": message_handler}
...
@sjrl Finally ready for another review! We're using "data" instead of "init_parameters" in serialization now and all newly implemented tools expose outputs_to_string, inputs_from_state, and outputs_to_state as init parameters.
I tested this by running the updated notebook. The Agent forked the repo and committed to a branch.
What do you think about the parameter name github_token? In almost every other place, we use api_key, so for consistency with the many other integrations, we could rename github_token to api_key or leave it as is.