mem0 icon indicating copy to clipboard operation
mem0 copied to clipboard

[GitHub Loader] Add support for loading specific folder and branch of a github repository

Open deshraj opened this issue 1 year ago • 8 comments

🚀 The feature

Users want to load a specific folder from a github repository. Moreover, they want to load the data from a specific branch and not the default branch.

Screenshot 2024-02-06 at 11 17 27 AM

Motivation, pitch

Requested by a user on discord community: https://discord.com/channels/1121119078191480945/1125758905310519327/1204150824868126790

deshraj avatar Feb 06 '24 19:02 deshraj

@deshraj Can I pick this up?

Dev-Khant avatar Mar 04 '24 05:03 Dev-Khant

@Dev-Khant sure go for it.

deshraj avatar Mar 04 '24 05:03 deshraj

Hi @deshraj,

Here to get data for repo, branch and for specific folder I think using get_repo function from Github library would be easier compared to the current approach of cloning the repo and then traversing the tree. For extracting specific file we can directly use get_contents.

Docs:

  1. get_repo: https://pygithub.readthedocs.io/en/latest/examples/Repository.html#get-all-of-the-contents-of-the-root-directory-of-the-repository
  2. get_contents: https://pygithub.readthedocs.io/en/latest/examples/Repository.html#get-a-specific-content-file

I have previously worked around this approach: https://github.com/Dev-Khant/Analyze-Github-Code/blob/main/LLM/scrap.py#L33

Making a change to this will only affect the query with type=="repo". Let me know if I can move ahead with this approach.

Dev-Khant avatar Mar 06 '24 14:03 Dev-Khant

Yeah this seems like a reasonable approach to me as well. Please proceed with this approach.

deshraj avatar Mar 06 '24 18:03 deshraj

@deshraj Here do we need to store data from results because currently data variable is already getting replaced by self._get_github_repo_data. Let me know if we want to add data from results or just the content of repo.

Screenshot 2024-03-08 at 2 46 58 PM

Dev-Khant avatar Mar 08 '24 09:03 Dev-Khant

Ah good catch. This seems like a bug and should be fixed. Can you please fix it in your PR?

deshraj avatar Mar 08 '24 09:03 deshraj

Yes I can fix it. But do we have to add results to data or just the repo contents?

Dev-Khant avatar Mar 08 '24 09:03 Dev-Khant

@deshraj I have raised the PR.

Dev-Khant avatar Mar 08 '24 16:03 Dev-Khant