opendal
opendal copied to clipboard
GitHub service
I would like to consider using the GitHub repository contents API as a service.
The API is documented here: https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28
My use case is for GritQL. When running on the CLI, we pull files directly from the file system, but in the cloud I'd like to pull files from GitHub.
@Xuanwo I can do this.
@Xuanwo I can do this.
Thanks a lot, have fun!
@Xuanwo I can do this.
Thanks a lot, have fun!
github must create a folder by creating a file, so the file queried by the list operation may be different from the actual, do you have any suggestions for this situation?
github must create a folder by creating a file, so the file queried by the list operation may be different from the actual, do you have any suggestions for this situation?
So github should not be marked as create_dir: true
@morgante Does that solve your problem?
@morgante Does that solve your problem?
Thanks for working on this! I gave it a try and noticed two issues:
- Access token shouldn't be required for reading from public repos
- Listing large repos is much slower than I anticipated, I think it's because you are using the contents API and recursively scanning. For listing large trees, this API is much more efficient.
@morgante Does that solve your problem?
Thanks for working on this! I gave it a try and noticed two issues:
About first, reading from public repos really don't need aceess token, but if we don't require access tokens, what do we do when we writing, report an error? And non-authenticated users can easily reach the rate limit. https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28
@morgante Does that solve your problem?
Thanks for working on this! I gave it a try and noticed two issues:
About first, reading from public repos really don't need aceess token, but if we don't require access tokens, what do we do when we writing, report an error?
Yes, I think it's fine to have an error at write-time. This is also going to happen if the access token you provide has read access but not write access.
@morgante I sovled first problem, but git trees api need treeSha to fetch trees, if we want to get a sub director, we must fetch root first, This may not increase the speed, I have not figured out how to solve it.
@morgante I sovled first problem, but git trees api need treeSha to fetch trees, if we want to get a sub director, we must fetch root first, This may not increase the speed, I have not figured out how to solve it.
It looks like the repo contents API returns a sha. So you could do this:
- Initial request:
GET /repos/{owner}/{repo}/contents/{path} - If any subdirs are returned in the response, call the tree API with
recursive=trueto grab them.
This should still be a major speed-up over the current approach.