stgit icon indicating copy to clipboard operation
stgit copied to clipboard

[Upstream Issue] Libgit2 and transitively Stacked Git do not support sparse checkouts

Open NonLogicalDev opened this issue 2 years ago • 16 comments

Upstream Issues:

  • https://github.com/libgit2/libgit2/issues/6044
  • https://github.com/libgit2/libgit2/issues/2263

Adding this issue to track upstream issues of LibGit2 related to sparse checkout support.

Our company is using a huge monorepo, and we have recently onboarded onto Git Sparse checkouts, which seem to be unsupported in LibGit2 which Rust version is using.

Despite my attempts at doing custom builds with some PRs I found online for LibGit2, although I could get the Read Only workflow to succeed like stg series.

I could not get the stg push to work as it seems like sparse index is throwing LibGit2 way off.

NonLogicalDev avatar Jul 29 '22 22:07 NonLogicalDev

Thanks for writing up this issue, @NonLogicalDev.

It would be nice if libgit2 supported sparse checkout natively. That said, it doesn't seem like that project has the will to take on the risk of sparse checkout any time soon.

When reimplementing StGit in Rust, a key design decision was how to interact with the git repository. Initially, I tried orienting almost entirely to libgit2--it is fast and its abstractions are pretty nice. However, as I implemented more and more of StGit, I ran into a bunch of compatibility problems; tests would mostly pass, but various edge cases would fail. Patch application and merge resolution (in various contexts) just don't work correctly.

So where libgit2 didn't get the job done, I reverted to using git commands.

Coming back to sparse checkouts, it might be that there is yet another subset of operations that StGit is currently doing with libgit2 that could be converted to git commands that would enable StGit to just work with sparse checkouts. I think this is a more likely path to success versus waiting for libgit2 to gain sparse checkout support.

Perhaps the line we can draw for libgit2's use in StGit is to only use it as an object and configuration database. I.e. looking up refs, commits, trees, blobs, and config, but any significant mutations of repository or working tree state should be done by git proper. At one point, I had done the work to entirely eliminate libgit2 as a dependency (I wrote an entire replacement stupid2 library), but the runtime performance was significantly worse than the current hybrid approach.

I can think of a couple of concrete next steps for this issue:

  1. Define some sparse checkout test cases that we want to work, covering at least new, refresh, push, and pop.
  2. Audit for the remaining places where StGit uses libgit2 for checkout and tree comparisons.

jpgrayson avatar Aug 15 '22 17:08 jpgrayson

If you do not use worktrees for your sparse checkout, the documentation implies that it should be fine to delete that configuration option (git config --unset extensions.worktreeConfig). See https://git-scm.com/docs/git-worktree#_configuration_file

A coworker of mine has a fork of Rust's git2 which implements (ignores? not sure) the extensions.worktreeConfig option: https://github.com/bierbaum/git2-rs

arxanas avatar Aug 16 '22 17:08 arxanas

@arxanas, you seem to be pointing out a detail related to how the sparse checkouts feature intersects with the multiple worktrees feature, which is interesting and potentially useful toward the intersection of those two features in StGit, but does not provide a solution path for StGit support of sparse checkouts. I just want to make sure I'm not missing something here.

jpgrayson avatar Aug 16 '22 20:08 jpgrayson

@jpgrayson For the end user, you can restore StGit support for sparse checkouts, assuming you don't also use multiple worktrees for that same checkout, by deleting the problematic extensions.worktreeConfig configuration key. If you don't happen to use both features together, then you can get StGit to work on your work monorepo's sparse checkout again.

Or you as the developer can update StGit to use the forked git2, which would at least let you ignore the unsupported extensions.worktreeConfig option. Then you can move forward with writing the tests and auditing the checkout/diff sites.

arxanas avatar Aug 16 '22 20:08 arxanas

My assumption is that StGit has plethora of problems with sparse checkouts independent of its intersection with multiple worktrees.

I had performed a quick test scenario where I enabled sparse checkouts on a small subset of a linux kernel repo using a vanilla/default worktree. The patch resulting from stg new --refresh contained deletions for all the thousands of files that were outside my sparse checkout cones. This is an obvious indicator that StGit does not work properly on a sparse checkout, even without using multiple worktrees.

Performing that same scenario again, but with extensions.worktreeConfig unset yields the same problem. AFAICT, StGit cannot do even the most basic operations in the context of a sparse checkout, regardless of whether multiple worktrees is in use.

From my point of view, there are mostly orthogonal paths toward solving sparse checkouts with StGit and multiple worktrees with StGit, and solving the major sparse checkouts issues does not require using a libgit2 patched to comprehend per-worktree configs. I still feel like I may be missing a point you're trying to make, @arxanas, so apologies if I'm not understanding your point.

jpgrayson avatar Aug 16 '22 20:08 jpgrayson

@arxanas I had the same experience as @jpgrayson when testing on my repo, I tried both building the PR that ignored per Worktree Config extension, and just un-setting the WorkTree Config Extension enable switch. In read only the experience was butter smooth, like it ought to be, the problems start when you try to interact with the actual index. I.e. refresh commands.

At that point LibGit really goes off the rails, as it thinks that all of the excluded files from the sparse index are deleted. I wish the fix was that simple. I almost believed myself that it was that simple, but then the reality rudely hit me in the face. =]

It is a difficult tradeoff to make, either support partial functionality (i.e. do not support the sparse checkouts) and keep the blazing speed of LibGit. Or take the hit of doing execs... Or shudders have a hybrid approach that can switch on the fly based on some signals, like if the sparse checkout is configured. (but that nearly double the work to support 1.5 codebases instead of just 1)

I really like the rust version, it is so darn fast! Big props @jpgrayson. I just wish I could use it for work haha.

NonLogicalDev avatar Aug 16 '22 22:08 NonLogicalDev

Sorry, I misunderstood the scope of this issue to be about crashing only. My own project git-branchless more or less worked after deleting extensions.worktreeConfig, so I was expecting the same here.

I use the hybrid approach where libgit2 is mostly an object database, and I shell out to git for checkout/status/commit/etc. operations. (There are other reasons to do this too. For example, libgit2 doesn't run Git hooks, so its implementation of status won't make use of the fsmonitor, which is a significant performance hit on large repos.)

Feel free to copy any code from there which you find useful. Maybe of note:

  • Get status: https://github.com/arxanas/git-branchless/blob/996c54b72fc212392e4ecd14c68d5152f4cde286/git-branchless-lib/src/git/repo.rs#L631-L638
  • Create commit from status entries: https://github.com/arxanas/git-branchless/blob/996c54b72fc212392e4ecd14c68d5152f4cde286/git-branchless-lib/src/git/snapshot.rs#L304-L308
  • Cherry pick fast: https://github.com/arxanas/git-branchless/blob/996c54b72fc212392e4ecd14c68d5152f4cde286/git-branchless-lib/src/git/repo.rs#L904-L919
  • Amend fast: https://github.com/arxanas/git-branchless/blob/996c54b72fc212392e4ecd14c68d5152f4cde286/git-branchless-lib/src/git/repo.rs#L1097-L1109=

and solving the major sparse checkouts issues does not require using a libgit2 patched to comprehend per-worktree configs

Well, I think you will still need to deal with the fact that git sparse-checkout set automatically sets the problematic extensions.worktreeConfig config variable, which will cause your calls to git2 to failing, even if you only use it as an object database.

arxanas avatar Aug 17 '22 02:08 arxanas

Thanks for this additional context, @arxanas. And thanks for the links to git-branchless. I've started looking at some of the code. Looks like a very cool project and it's helpful to see your approaches to similar problems.

jpgrayson avatar Aug 17 '22 16:08 jpgrayson

I've done some work on this problem. See above commits. For all the use cases I've tried, StGit now seems to work in a sparse checkout worktree. I'd love to know if this works for your use cases, @NonLogicalDev. Maybe checkout the new t7000-sparse-checkout.sh tests as well.

Some additional details:

The strategy was to use git instead of libgit2 for more operations, namely:

  • Status checks
  • Checkouts
  • All index operations

These changes do seem to affect runtime performance, at least as measured by how long the test suite takes. Perhaps on the order of 10%. This seems like a small price to pay for correctness. Also, these changes shouldn't have much/any affect on the most latency-sensitive StiGit commands, such as stg status, stg id, stg top. It's the commands that mutate repository state that now have a bit more overhead. I'll also note that for regular/large repos (as opposed to the tiny repos used in the test suite), the overhead of running more subordinate git processes will be less relevant and the git versions of the various operations may very well be faster at-scale than what we were doing with libgit2.

jpgrayson avatar Aug 24 '22 23:08 jpgrayson

@jpgrayson ❤️ , let me get to testing this right away!

NonLogicalDev avatar Aug 26 '22 21:08 NonLogicalDev

I think I may have uncovered another issue:

> stg lg
error: Malformed version 4 meta: missing Previous

> stg version
Stacked Git 2.0.0-beta.2 (2f0a3980b)
Copyright (C) 2005-2022 StGit authors
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
SPDX-License-Identifier: GPL-2.0-only
git version 2.36.1

NonLogicalDev avatar Aug 26 '22 21:08 NonLogicalDev

Strange one:

> git cat-file -p refs/stacks/master:stack.json
{
  "version": 5,
  "prev": "78ca357f4edf94549b0e91146076ba0f6b305f3c",
  "head": "2f0a3980b9b10721cd1d6c1e32c078438b894dae",
  "applied": [],
  "unapplied": [
    "notes-copy",
    "silly",
    "update-index-error-fix",
    "feature-edit-patch-update"
  ],
  "hidden": [
    "branch-rename-tests"
  ],
  "patches": {
    "notes-copy": {
      "oid": "a3c04fdcb7135f6b7f74e5b8747d04e6c28acd1e"
    },
    "silly": {
      "oid": "7c04ba132f1b7083670a89bcc61060cebc5f80aa"
    },
    "update-index-error-fix": {
      "oid": "d7a2b12e477e3558736cd197552b6e85162b7d82"
    },
    "feature-edit-patch-update": {
      "oid": "7077272df01e0cb9dc4dde015b469b98b93f61fb"
    },
    "branch-rename-tests": {
      "oid": "1e19a8c6bf879986ee9bacd07eba72102467d39b"
    }
  }
}

NonLogicalDev avatar Aug 26 '22 21:08 NonLogicalDev

I will cut a separate issue with that, I think I just happened to have old master.stgit ref sticking around.

NonLogicalDev avatar Aug 26 '22 21:08 NonLogicalDev

The stack upgrade path (which is where the "Malformed version 4 meta" error originates) is triggered by the presence of the refs/heads/master.stgit ref. So although you seem to have version 5 metadata (i.e. because refs/stacks/master exists and contains stack.json, your master branch also seems to have version 4 metadata. It is unclear to me how a branch would end up in this state via normal StGit operations.

I'll note that stack metadata version 4 was introduced in StGit 1.0 and stack metadata version 5 was introduced in StGit 1.2.

Also, I believe this metadata issue is orthogonal to sparse checkout. You should be able to run StGit on a new branch to test sparse checkout. And you can probably recover from the metadata problem by deleting refs/heads/master.stgit.

jpgrayson avatar Aug 26 '22 21:08 jpgrayson

@jpgrayson I played around with new stg version on our monorepo with sparse checkouts. So far so good. push / pop seem to be working as expected.

But I had to disable git config --unset extensions.worktreeConfig to have it work otherwise I get our good old friend:

> stg series
error: unsupported extension name extensions.worktreeconfig; class=Repository (6)

I will need to do further testing if sparse checkout related commands still work with that setting un-set.

NonLogicalDev avatar Aug 26 '22 21:08 NonLogicalDev

Thanks for the note about extensions.worktreeConfig. I guess this is yet another corner of the libgit2 API that StGit should perhaps avoid.

jpgrayson avatar Aug 26 '22 23:08 jpgrayson

AFAIK StGit is doing okay with sparse checkouts at this point. Closing this issue. We can reopen as necessary.

jpgrayson avatar Nov 01 '22 19:11 jpgrayson

The vanilla STG still does not work:

> stg series
error: unsupported extension name extensions.worktreeconfig; class=Repository (6)

I have created a PR to fix this with a workaround. It has been working without issues for us at my company for the past few months.

NonLogicalDev avatar Nov 15 '22 23:11 NonLogicalDev