scmrepo icon indicating copy to clipboard operation
scmrepo copied to clipboard

fs: working with remote git repos

Open mike0sv opened this issue 3 years ago • 2 comments

In mlem we need to have fsspec implementation for remote git repos. For now we rely on builtin GithubFileSystem for github, but it does not support git credentials, and also we want to support urls like ssh://git@... or git://... that point to git repos. The easiest way imo will be to just clone repo to temporary dir and then delegate to LocalFileSystem (with some path hacking), but I am no expert on git internals.

mike0sv avatar Dec 08 '21 15:12 mike0sv

In mlem we need to have fsspec implementation for remote git repos.

Can you expand on the specific things you need to know about remote git repos? Do you need to be able to modify files (and write new commits w/those changes)? Or do you just need read-only access (so listing files in the repo and/or being able to do open(file, mode='r')?) @mike0sv

The easiest way imo will be to just clone repo to temporary dir

AFAIK this is really the only possible (universal) solution. Outside of like github/gitlab that have built additional APIs on top of their own web services, there is no way to get this information from a remote git repo/server other than requesting (git fetching) the specific commit and tree objects you want and reading them yourself locally (i.e. cloning the repo).

and then delegate to LocalFileSystem (with some path hacking)

This part is not actually required, at least in a read-only scenario. A read-only remote GitFS implementation wouldn't actually need the local checkout, just the fetched commit/tree objects which can already be handled parsed directly using our existing git/scm tree/fs functionality. Basically we just need the equivalent of a bare repo, we don't need the local working tree for anything as long as we only need to be read-only, and ideally we would also use shallow fetching when possible to only get the specific objects/commits we care about, rather than the entire repo history.

pmrowla avatar Dec 15 '21 06:12 pmrowla

For now I think read-only will be enough. Maybe we'll want to write files in the future, but there is a lot of questions, like how to write multiple files in one commit (transactions?), or how to specify commit messages

mike0sv avatar Dec 15 '21 15:12 mike0sv