opendal icon indicating copy to clipboard operation
opendal copied to clipboard

Add support to read from git/github/gitlab directly

Open Xuanwo opened this issue 3 years ago • 5 comments

Git support seems not directly way (we need to git fetch with --depth 1?)

For Github:

  • Get a branch: /repos/{owner}/{repo}/branches/{branch}
  • Get a tree: /repos/{owner}/{repo}/git/trees/{tree_sha}

The problems is we need to fetch the whole index before we can read /abc/def.

Xuanwo avatar Jun 10 '22 04:06 Xuanwo

Doesn't make sense. Let's close.

Xuanwo avatar Jul 28 '22 16:07 Xuanwo

Actual I think this is fun enough to support partial read for git repo, with this feature, we're able to read some files from git repo without clone all, and no need to worry about cleaning local cache.

git has been support sparse-checkout, and here is the tech-doc for developers

Still, git protocol is really hard to understand, I'm not sure how to impl this.

DCjanus avatar Dec 14 '22 03:12 DCjanus

Seems rust-lang's git2 doesn't support this yet?

Xuanwo avatar Dec 14 '22 07:12 Xuanwo

For GitHub, we can fetch contents via https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28

Xuanwo avatar Dec 14 '22 07:12 Xuanwo

Seems rust-lang's git2 doesn't support this yet?

Yes, which means we have to impl the protocol by ourself. that's why I post the tech doc link

DCjanus avatar Dec 14 '22 08:12 DCjanus

After a long time, I think I should explain why the implementation of this feature is difficult. In the Git V2 protocol, if we try to implement the list or stat method, in order to get the size of a remote object, we have to actually pull the object data to the local machine, which is too heavy for both the list and stat methods.

DCjanus avatar May 06 '23 08:05 DCjanus

fsspec will clone the repo to local first :rofl:

Xuanwo avatar May 06 '23 15:05 Xuanwo

There hasn't been enough interest so far, let's close for now. Thanks @DCjanus for the information.

Xuanwo avatar Aug 21 '23 08:08 Xuanwo