opendal icon indicating copy to clipboard operation
opendal copied to clipboard

GitHub service

Open morgante opened this issue 1 year ago • 10 comments

I would like to consider using the GitHub repository contents API as a service.

The API is documented here: https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28

My use case is for GritQL. When running on the CLI, we pull files directly from the file system, but in the cloud I'd like to pull files from GitHub.

morgante avatar Feb 23 '24 04:02 morgante

@Xuanwo I can do this.

hoslo avatar Feb 26 '24 10:02 hoslo

@Xuanwo I can do this.

Thanks a lot, have fun!

Xuanwo avatar Feb 26 '24 10:02 Xuanwo

@Xuanwo I can do this.

Thanks a lot, have fun!

github must create a folder by creating a file, so the file queried by the list operation may be different from the actual, do you have any suggestions for this situation?

hoslo avatar Feb 27 '24 04:02 hoslo

github must create a folder by creating a file, so the file queried by the list operation may be different from the actual, do you have any suggestions for this situation?

So github should not be marked as create_dir: true

Xuanwo avatar Feb 28 '24 04:02 Xuanwo

@morgante Does that solve your problem?

hoslo avatar Mar 20 '24 03:03 hoslo

@morgante Does that solve your problem?

Thanks for working on this! I gave it a try and noticed two issues:

  • Access token shouldn't be required for reading from public repos
  • Listing large repos is much slower than I anticipated, I think it's because you are using the contents API and recursively scanning. For listing large trees, this API is much more efficient.

morgante avatar Mar 22 '24 05:03 morgante

@morgante Does that solve your problem?

Thanks for working on this! I gave it a try and noticed two issues:

About first, reading from public repos really don't need aceess token, but if we don't require access tokens, what do we do when we writing, report an error? And non-authenticated users can easily reach the rate limit. https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28

hoslo avatar Mar 22 '24 06:03 hoslo

@morgante Does that solve your problem?

Thanks for working on this! I gave it a try and noticed two issues:

About first, reading from public repos really don't need aceess token, but if we don't require access tokens, what do we do when we writing, report an error?

Yes, I think it's fine to have an error at write-time. This is also going to happen if the access token you provide has read access but not write access.

morgante avatar Mar 22 '24 16:03 morgante

@morgante I sovled first problem, but git trees api need treeSha to fetch trees, if we want to get a sub director, we must fetch root first, This may not increase the speed, I have not figured out how to solve it.

hoslo avatar Apr 01 '24 02:04 hoslo

@morgante I sovled first problem, but git trees api need treeSha to fetch trees, if we want to get a sub director, we must fetch root first, This may not increase the speed, I have not figured out how to solve it.

It looks like the repo contents API returns a sha. So you could do this:

  • Initial request: GET /repos/{owner}/{repo}/contents/{path}
  • If any subdirs are returned in the response, call the tree API with recursive=true to grab them.

This should still be a major speed-up over the current approach.

morgante avatar Apr 01 '24 19:04 morgante