scmrepo
scmrepo copied to clipboard
fs: working with remote git repos
In mlem we need to have fsspec implementation for remote git repos. For now we rely on builtin GithubFileSystem for github, but it does not support git credentials, and also we want to support urls like ssh://git@... or git://... that point to git repos.
The easiest way imo will be to just clone repo to temporary dir and then delegate to LocalFileSystem (with some path hacking), but I am no expert on git internals.
In mlem we need to have fsspec implementation for remote git repos.
Can you expand on the specific things you need to know about remote git repos? Do you need to be able to modify files (and write new commits w/those changes)? Or do you just need read-only access (so listing files in the repo and/or being able to do open(file, mode='r')?) @mike0sv
The easiest way imo will be to just clone repo to temporary dir
AFAIK this is really the only possible (universal) solution. Outside of like github/gitlab that have built additional APIs on top of their own web services, there is no way to get this information from a remote git repo/server other than requesting (git fetching) the specific commit and tree objects you want and reading them yourself locally (i.e. cloning the repo).
and then delegate to
LocalFileSystem(with some path hacking)
This part is not actually required, at least in a read-only scenario. A read-only remote GitFS implementation wouldn't actually need the local checkout, just the fetched commit/tree objects which can already be handled parsed directly using our existing git/scm tree/fs functionality. Basically we just need the equivalent of a bare repo, we don't need the local working tree for anything as long as we only need to be read-only, and ideally we would also use shallow fetching when possible to only get the specific objects/commits we care about, rather than the entire repo history.
For now I think read-only will be enough. Maybe we'll want to write files in the future, but there is a lot of questions, like how to write multiple files in one commit (transactions?), or how to specify commit messages