gitoxide icon indicating copy to clipboard operation
gitoxide copied to clipboard

octopus-merge (part 5: tree-merge-ORT three-way)

Open Byron opened this issue 1 year ago • 1 comments

Three-way merging of trees.

Follow-up of #1612.

Tasks

  • [x] baseline tests for tree-merging
  • [x] successful first tree-merge without conflicts
  • [ ] make all tests pass (catch-all task that will be refined on the fly)
    • [x] fully parse all information provided by Git in baseline
    • [x] update the conflict-lookup table to reflect changes to the tree as well - avoid double conflicts or missed conflicts if something clashes with a newly added rename, for instance.
    • [x] conflict-tree by index to allow getting 'next' of ours for lookahead
    • [x] tree::Editor::get() to find unique path names
    • ~~multi-tree traversal (without wildcard support!)~~ - for now, let's just do it 'the simple' way and perform two diffs
    • optimization idea for when there are numbers: a 'brute force' implementation that uses threads would benefit from the ability to re-use object caches of a repo that has seen the base-tree already, but overall, who knows*
    • there is tests for merging merge-bases in diff3-conflict-markers
    • diff3-conflict-markers.sh - be sure to capture the 'empty tree' label , but also other special cases
    • See also, all the merge options
  • [x] fix reversed tests for blobs
  • [x] fuzzing for blob merges
  • [ ] assure that each case has a motivating test!
  • [ ] remove TODOs
  • [ ] Repository::merge_trees
  • [ ] A trivial gix merge tree implementation, based on commits. Maybe create something to easily merge multiple commits while at it (in gix).

Next PR / Outscoped

  • Submodule merges are also possible! Maybe outscope it though! libgit2 also doesn't try it.
  • textconv with context, see this gist for details.
    • There seem to be different 'tiers' of tools, some don't get GIT_DIR set, others do.
    • It also seems that diff-programs get too much context right now, but that depends on how much is passed to them by the caller as gix-command::Context.
  • How to model virtual-merge-bases? Can be none or many, user should have control over how this is done.
  • Actual tree-based merging
  • ~~Make blob-merge based conflict handling possible in the tree merge from gix at least.~~ - not needed for now

Research

Everything is about MergeORT.

  • merge-options passed with -X ours for instance don't affect tree-related auto-resolutions, just the ones related to content. This could be implemented when there is demand though.
  • it uses an empty tree if there is no merge-base - we must allow the same.
  • it allows for multiple merge-bases, creating a virtual one by merging all merge-bases together using the same algorithm, recursively.
  • merges can have conflicts without a individual files being involved, for instance when directory renames clash
  • Note that merge-ORT cannot properly handle renames within renamed directories, ending up with the source of the subdir-rename still present.
❯ git ls-tree -r $(git merge-tree main feat)
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/2
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/7
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/subdir/6
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec/subdir-renamed/6
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sequencer
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    gix/5
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    h
  • Must make sure that possible types of conflicts are properly communicated, to not degenerate information
  • It puts conflict-markers in the blobs of the result tree, with annotations to provide additional context
  • Need resolution configuration, see git2::MergeOptions.
  • data stored by path, and is interned in the map to allow pointer-based comparisons
    • merge-info with everything one needs to know, also related to renames
    • or conflict information
    • it uses a memory-pool/arena to get memory for many paths all at once (and also release it like that)
  • paths start out as conflicted, and then can later be changed to non-conflicting if a content-based merged succeeded.
    • If it remains conflicts, the meta-data is used to produce an 'as merged as possible' version with conflict markers that can be checked out to the working tree.
  • hunks can partially overlap, but can also be resolved line-by line to some extend.

Byron avatar Oct 09 '24 07:10 Byron

As detailed in #1623, which provides a fix, the failure observed here in the CI test job is actually not due to any of the changes in this PR, and also occurs if CI is re-run on the tip of main. It is instead due to the upgraded runner image not having the headers needed for building with -llzma, which is needed for the xz feature of gix-testtools. This is also entirely unrelated to #1622, which does ~~it~~ not yet affect CI.

Merging #1623 and then rebasing this onto main should fix the test failure here. The other failure here is in the lint job and unrelated.

EliahKagan avatar Oct 13 '24 21:10 EliahKagan