sapling FR: Support simultaneous Sapling and git repo

One limitation in adoption that I've noticed is that there's no apparent way to use Sapling in conjunction with a repo that relies on git-lfs https://git-lfs.github.com/

This doesn't really seem possible given that Sapling precludes an actual git repo, but maybe a workaround is enabling a simultanous git and sapling repo? This would make it easier to try out Sapling as well, since users could go into their existing repo and initialize it for use with Sapling.

Nov 17 '22 01:11 xEtherealx

By "simultaneous" do you mean the ability to have a working copy that git understands, so you can run commands like git add or let git handles the .gitattributes for LFS files? Unfortunately Git's and Sapling's working copy concepts (ex. whether there is a staging area) and implementations (ex. whether "add file" takes a long time if you have 1 million files in working copy) are very different so they cannot be made compatible easily or meaningfully.

You might find more context in this section.

Nov 17 '22 19:11 quark-zju

Thanks, I'd read that doc section but it wasn't helpful in understanding what the limitations here are -- I'd guess that git lfs uses git add under the hood, so if what you're saying is that it's not possible to have git and sapling effectively ignorant of each other, then I suppose that's a non-starter.

Nov 18 '22 01:11 xEtherealx

We had some discussion on this topic. While we understand simultaneous git could be useful, we think it's too difficult to meaningfully implement or maintain. Besides, it could be confusing since we want to push for a different user interface that's different from Git.

So I'm closing this as a wontfix.

Nov 22 '22 02:11 quark-zju

I have been reading about Sapling and was getting really excited about all of its cool features and about trying to sell it to coworkers until I realized that things like VS Code's GitLens won't work. I normally interact with git from the command line, so I plan on using sl anyway, but I think that's kind of a nonstarter for a lot of people even though all of the PR stack support, sparse clone features, etc are a huge leap forward.

Locally I've gotten this working by symlinking Sapling's internal git repo to .git in the working tree, setting core.bare to false, and then manually telling git to reset to whichever hash sl thinks is the current one whenever HEAD changes. I know that sl's git repo is an implementation detail and I shouldn't rely on it, but it seems to work well. If you do something like git commit or git checkout, sl doesn't follow along, but for my purposes right now that's fine.

I've also gotten sparse checkouts working correctly in git by enabling them for the repo and then updating git's sparse-checkout list on every HEAD change. This makes it so that git-based tooling doesn't report tons of files as deleted.

It's quite possible there are edge cases in this or other situations I haven't yet thought of, but from poking around in a demo repo I made, this seems to work quite well.

So in summary, one-time setup:

ln -s .sl/store/git .git
git config --local core.bare false
git config --local core.sparseCheckout true  # if using sparse checkout; otherwise running `git sparse-checkout set` will enable `extensions.worktreeConfig`, which breaks Sapling

then every HEAD change:

git reset $(sl id | tr -d '+')
[[ $(git config --get --type bool core.sparseCheckout) == true ]] && sl sparse cwd | grep -v '^- ' | cut -c3- | git sparse-checkout set --stdin

I guess what I'm wondering is if this issue could possibly be reconsidered, because allowing people to continue using their git-based tools (like GitLens, or GitHub like sl already does) would go a long way toward making adoption a no-brainer. For example, if Sapling is willing to stick with the internal git folder and update the HEAD pointer in it on every commit / goto (and possibly even update .git/info/sparse-checkout) then we could use all of our tools that rely on git being the version control system.

I would also be happy to try my hand at a pull request if something like that sounds acceptable!

Jan 02 '23 20:01 adzenith

Hi @adzenith, we understand the usefulness of existing Git eco-system. But there are some practical challenges. Let me try to explain in more details. Feel free to ask questions if you have.

First, we cannot meaningfully support .git/index today (needed for git status, git diff, git add etc). This is because Sapling does not have a "staging area" concept. We think Sapling's implementation might have a better time complexity so sl add in a repo with millions of files might be faster than git add, especially when in the future we plan to make it possible to use our virtualized filesystem. Implementing the staging area feature properly is non-trivial. Implementing something with a worse time complexity would have a difficult time justifying internally.

Second, it'd be difficult for us to maintain .git/ compatibility with our scalability goals in the future. Our (future) scalability goals mainly include the virtualized file system (edenfs), and a lazy commit graph with customized protocols. These features already run in production internally but does not play nicely with .git/:

Our lazy commit graph does not have a Git equivalent - there is a full graph represented in compact integer vertexes, then commit hashes and messages are fetched on demand. There is currently no way to describe our lazy commit graph in .git file format that other git implementations can understand.
We want to store locally generated and remotely fetched commits separately, with the assumption that remotely fetched commits can be re-fetched so they can be stored in a LRU cache to keep space usage bounded.
It'd be nice if we can use our in-house data structures that do not require repack to maintain logN lookup time complexity.
Even if we still use .git/ format somehow, it's unclear if we can make git fsck happy or keep git gc from deleting useful data.

Third, allowing users to run git commands directly in the repo would bypass Sapling's assumptions and cause confusions. For example,

A user might run git fetch <url> and the default behavior of fetching all remote references is undesirable for Sapling.
A user might run git tag foo, but Sapling does not deal with local tags.

Given the above considerations, we don't think there is a way to provide .git/ compatibility while maintaining a meaningful UX for regular users (ex. they might ask "why git status/add/diff does not work?" and there are no good answers/fixes) and/or supporting our future scalability plans.

That said, I think compatibility at the the command line layer, not the raw file format layer is something we're interested in. More specifically, a tool named git that:

Runs the original git in a .git/ repo.
Tries to translate the command to sl equivalent and run it in a .sl/ repo. Might be similar to the githelp command. You'll need to check what commands GitLens uses and ensure they translate well.

For lower startup overhead, the tool might be implemented in Rust and avoid heavyweight dependencies.

Jan 03 '23 20:01 quark-zju

@quark-zju thanks for the detailed explanation. It sounds like sl's internal .git folder really isn't suitable for this purpose. I do like your idea of a replacement git tool, though, that calls through to sl similar to githelp. That sounds neat, and much less brittle than the hack I had been trying out.

Jan 04 '23 10:01 adzenith

I am trying out making a git wrapper script that calls into sl, and I'm not sure it's a reliable approach unless there's a whole lot of engineering investment into it. There's a ton of commands that Git supports, all of which have flags that subtly or dramatically affect the output, and the moment someone upstream changes what call they're making to something unsupported by the wrapper it will immediately break. I'm pretty confident I can get to something that'll make all GitLens features work (I've already tricked VS Code into believing that the Sapling repo supports Git), but essentially what has to happen here is a full-featured emulation layer.

I guess the real answer would be to fix this on the tooling side (e.g. in GitLens), but it's a sort of chicken-and-egg problem where I would think people would want to use the tool with the best support and also want to make tooling support for the most-used tool.

As I mentioned, I for one will be trying out sl, but this complication is a hurdle to switching whose size really depends on the dev and how much they rely on things like GitLens. My hope is that the Git remote support is enough to allow sl to take off, because each dev can decide for themselves if they want to switch.

Thanks!

Jan 04 '23 14:01 adzenith

There's a ton of commands that Git supports, all of which have flags that subtly or dramatically affect the output, and the moment someone upstream changes what call they're making to something unsupported by the wrapper it will immediately break.

It seems there are 2 kinds of use-cases:

Human interaction. Any feature from git might be used.
Automation, such as GitLens, other IDE integration, git-repo etc. Usually only a subset of git features are used.

Covering the human interaction use-cases is quite complex as you mentioned, but covering the automation use-cases seems practical and provides value. We can document that the wrapper is for automation compatibility, not for everyday consumption, or even refuse to run if the output is a tty. It could start with just GitLens compatibility, then extend to other tools later.

For human interaction use-cases, I think we want to recommend Sapling's UX instead of git's so it's okay or even preferred if the wrapper does not support it.

Jan 04 '23 16:01 quark-zju

@adzenith I arrived at a very similar solution to you, where its synced automatically via sapling hooks, see https://github.com/facebook/sapling/issues/745. The git repo is initialized with git init --separate-git-dir .sl/store/git ..

Since git shares a working directory with sapling in this setup, we can avoid a lot of work by just calling git update-ref HEAD and then git read-tree HEAD to update the index, without touching every file in the repo.

In my case we have a large amount of tooling built on git and could not take the risk of behavior differences from translating to sapling commands. So would need a real git checkout. For all the cases I care about the git usage is read-only, if that helps.

Before I ran across .sl/store/git I was pushing to a completely separate shadow git checkout, but was not able to get this to run automatically in hooks because I couldn't sl push while the repo lock is held. I ended up with a git wrapper script that did this lazily, and it seemed to work but is slow. Doing it in hooks is much nicer.

I understand the desire to decouple from the git storage format but I think lack of a read-only git interface is going to block a lot of adoption.

How will github PR submit work when there's no local git store? I wonder if we could leverage that somehow.

Oct 20 '23 16:10 ahupp