jsdiff icon indicating copy to clipboard operation
jsdiff copied to clipboard

3-way merge

Open uiteoi opened this issue 8 years ago • 14 comments

This is a feature request.

Given a common ancestor and 2 sets of patches, provide a merged document that may include conflicts to resolve manually.

This would be useful to allow two or more users collaborate on the edition of a document, implementing an equivalent of git merge.

It is assumed that the most recent common ancestor is know and provided as a parameter:

JsDiff.merge( commonAncestor, x_patches, y_patches [, options] )

Where x_patches and y_patches are Arrays of patches for the x and y branches from the common ancestor,

Options may be used to provide a manual merge patterns which could default to:

mergePattern: [ "<<<<<<< x", "=======", ">>>>>>> y" ]

For the algorithm, see https://www.quora.com/How-does-Git-merge-work.

uiteoi avatar Jun 05 '17 12:06 uiteoi

This was implemented but not made part of the top-level API https://github.com/kpdecker/jsdiff/commit/f94da90517beab9e9880890a6b32792737f0a6c2

gonzofish avatar Jun 21 '17 18:06 gonzofish

@gonzofish how to write it to file like git does?

goodmind avatar Feb 07 '19 13:02 goodmind

sorry @goodmind, I don't remember, I haven't used this library in over a year...sorry I cannot be more help

gonzofish avatar Feb 07 '19 14:02 gonzofish

@kpdecker

goodmind avatar Feb 07 '19 14:02 goodmind

I have done it, but I found merge misbehaving bug Basically what I did was hand-process the "+", "-" " " prefix of the lines that result from calling merge, and handle the conflict blocks that look like { conflict: true, mine: ['#yes'], theirs: ['#no']} interspersed into the string lines of the patch. You write >>>>>>>> (emit the 'theirs' lines) ======== (emit 'ours' lines) <<<<<<<<

doug65536 avatar Jan 25 '22 23:01 doug65536

Hmm. Just started looking at the merge.js code and besides the already-reported bugs I see more to dislike; loadPatch seems to heuristically detect if the passed param is a patch or not by looking for substrings like @@, and change its behaviour accordingly, which seems obviously fragile/incorrect given that you could legitimately want to diff two files that contain those substrings. An API that depends upon that kind of heuristic to determine how to treat its inputs is fundamentally bad, IMO.

ExplodingCabbage avatar Jun 24 '24 15:06 ExplodingCabbage

I just had another few minutes to look at this. I've been reading the key bit of logic here and trying to reason about it:

  while (mineIndex < mine.hunks.length || theirsIndex < theirs.hunks.length) {
    let mineCurrent = mine.hunks[mineIndex] || {oldStart: Infinity},
        theirsCurrent = theirs.hunks[theirsIndex] || {oldStart: Infinity};

    if (hunkBefore(mineCurrent, theirsCurrent)) {
      // This patch does not overlap with any of the others, yay.
      ret.hunks.push(cloneHunk(mineCurrent, mineOffset));
      mineIndex++;
      theirsOffset += mineCurrent.newLines - mineCurrent.oldLines;
    } else if (hunkBefore(theirsCurrent, mineCurrent)) {
      // This patch does not overlap with any of the others, yay.
      ret.hunks.push(cloneHunk(theirsCurrent, theirsOffset));
      theirsIndex++;
      mineOffset += theirsCurrent.newLines - theirsCurrent.oldLines;
    } else {
      // Overlap, merge as best we can
      let mergedHunk = {
        oldStart: Math.min(mineCurrent.oldStart, theirsCurrent.oldStart),
        oldLines: 0,
        newStart: Math.min(mineCurrent.newStart + mineOffset, theirsCurrent.oldStart + theirsOffset),
        newLines: 0,
        lines: []
      };
      mergeLines(mergedHunk, mineCurrent.oldStart, mineCurrent.lines, theirsCurrent.oldStart, theirsCurrent.lines);
      theirsIndex++;
      mineIndex++;

      ret.hunks.push(mergedHunk);
    }
  }

Maybe I'm missing something, but just by inspection, I think I immediately see a bug in that else block at the end - namely that it doesn't handle the case where one hunk from mine overlaps with two separate hunks from theirs, or vice versa. So it'll (I think) produce a patch with overlapping, perhaps conflicting, hunks. I should probably create a test case demonstrating this.

ExplodingCabbage avatar Jun 26 '24 19:06 ExplodingCabbage

Yeah, the bug I spotted in review is real. Here's a demo script:

diff = require('.');

patch1 = `--- foo.txt   2024-06-26 20:47:03.936414424 +0100
+++ bar.txt 2024-06-26 20:47:04.892377928 +0100
@@ -1,10 +1,10 @@
 one
-two
-three
-four
-five
-six
-seven
-eight
-nine
-ten
+zwei
+drei
+vier
+fünf
+sechs
+sieben
+acht
+neun
+zehn
`

patch2 = `--- foo.txt   2024-06-26 20:47:03.936414424 +0100
+++ baz.txt 2024-06-26 20:47:06.544314866 +0100
@@ -1,4 +1,4 @@
-one
+un
 two
 three
 four
@@ -7,4 +7,4 @@
 seven
 eight
 nine
-ten
+dix
`

diff.merge(patch1, patch2)

There's a conflict between the two patches (the first one deletes ten and replaces it with zehn, while the second one deletes ten and replaces it with dix) and so we SHOULD get one hunk with conflict: true. But this gets missed by merge and we don't get conflict: true; instead we get a patch with these overlapping hunks:

[
  {
    oldStart: 1,
    oldLines: 10,
    newStart: 1,
    newLines: 10,
    lines: [
      '-one',   '+un',     '-two',
      '-three', '-four',   '-five',
      '-six',   '-seven',  '-eight',
      '-nine',  '-ten',    '+zwei',
      '+drei',  '+vier',   '+fünf',
      '+sechs', '+sieben', '+acht',
      '+neun',  '+zehn'
    ]
  },
  {
    oldStart: 7,
    oldLines: 4,
    newStart: 7,
    newLines: 4,
    lines: [ ' seven', ' eight', ' nine', '-ten', '+dix' ]
  }
]

ExplodingCabbage avatar Jun 26 '24 19:06 ExplodingCabbage

Plan:

  • add test cases for the bug above and for https://github.com/kpdecker/jsdiff/issues/341.
  • rewrite the core logic for merging hunks (in merge and mergeLines) from scratch, without paying too much attention to what's already there. See if I can get it right - i.e. end up with something that passes all tests

If I also screw up and create something buggy, then I'll see if I can learn something from what's already there and synthesize a working implementation.

ExplodingCabbage avatar Jun 26 '24 20:06 ExplodingCabbage

Oh man - I just looked at tests and realised that the existing code intentionally tries to support the case where the two patches use a different base file (rather than just throwing an error in that situation). Ugh, why? I struggle to reason about the use case for that or figure out what correct behaviour is. I don't see how even attempting to do that can possibly be useful, because if you're allowing for the possibility that your patches are based on slightly different base files, then you've got to handle the possibility that two hunks that are really patching the "same" bit of the file start from different old line numbers. But that requires a completely different algorithm, that matches hunks from the two patches against each other based on (possibly fuzzy) matching of shared context (like when applying a patch), and for which "correct" results are often going to be outright different to an algorithm that doesn't attempt to support merging patches with different bases.

It just seems ill-conceived to me; IMO if you're two patches make different claims (in context or removal lines) about what the old content of line 52 was, you should immediately throw an error saying "Patches disagree about old content of line 52". Treating this as a conflict, like when one patch wants to delete line 52 and another patch wants to insert a line after line 52, isn't correct; it's a fundamentally different situation that calls the validity of the entire merge into question.

ExplodingCabbage avatar Jun 29 '24 18:06 ExplodingCabbage

Hmm - on the one hand, trying to apply a patch to a file that doesn't exactly match its expectations is pretty common; that's what conflicts are. But on the other hand, if you're trying to apply to merge two patches into a coherent whole, they need some sort of common ancestor to start from.

I can imagine a situation where you "zip" two branches together, merging the commits in chronological order - although that seems ill-advised in general - and then you might end up with two patches that both disagree about what the base should look like because neither takes into account the preceding partial merge. But at that point I think you just have to pick a patch to merge first, not try to merge both at the same time, so you're not really talking about a 3-way merge anymore.

ehoogeveen-medweb avatar Jun 29 '24 18:06 ehoogeveen-medweb

The more I think about this the more I think it's actually super-complicated.

Here are six different tasks that you might characterise as merging two patches, but which are different from each other:

  1. You have two patches to merge that are presumed to have the exact same base file... a. ... and you also have the full content of that base file b. ... but you don't have the content of the base file
  2. You have two patches to merge that may be patching different versions of the base file... a. ... and you have those two versions and their common ancestor b. ... and you have those two versions but NOT any common ancestor c. ... and you have a common ancestor, but not the versions the patches apply to d. ... and you don't have the source of any version whatsoever of the file you're patching

Each of these six different scenarios probably requires a slightly different algorithm to any of the other five, and they require different interfaces and ways of resolving conflicts, too, since in some cases you can present a file with merge conflicts in it like Git does, and in others you can't.

Note also that even in the absence of conflicts, the behaviours of these functions have to differ. Suppose you are merging the following two patches, without a copy of the base file...

Patch 1:

--- file.js
+++ file2.js
@@ -943,7 +943,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

Patch 2:

--- file.js
+++ file2.js
@@ -917,7 +917,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

If you are trying to perform task 1b - merging two patches that have the same base - then your code should "reason" along these lines:

Well, the old line numbers indicate that these hunks don't overlap. They happen to have the exact same context lines, but since the line numbers are absolutely trustworthy, that must be coincidence; the same lines must simply appear at two points in the base file. Therefore, to merge these patches, I should produce a patch that includes both hunks.

And you end up with something like this:

--- file.js
+++ file2.js
@@ -917,7 +917,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {
@@ -943,7 +942,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

But if you are trying to merge the very same two patches in order to perform task 2d - merging two patches whose bases are presumed to be non-identical, albeit with some common ancestor - then your code should "reason" about them differently. It should instead think to itself:

Well, the context for these two hunks is identical. That probably means they're both deleting the same line. Yes, the line numbers don't match - they give start line numbers for the hunk in the base file 26 lines apart - but since we know the bases have diverged, that could easily because the first patch was generated using a base file where 26 lines of extra content had previously been inserted higher up in the file. We fundamentally can't know for sure, but the most reasonable guess we can make is that these two patches represent the same change, and we should therefore output a "merged" patch with a single hunk.

And then you get output something like this (albeit it's not clear what the line numbers should be - here I've averaged them!)

--- file.js
+++ file2.js
@@ -930,7 +930,6 @@
   if (!elit.sit) {
     return [];
   }
-  const sed = elit[0];
   return eiusmod.tempor(sed) ? sed : [sed];
 }
 function incididunt(ipsum, ut = 1) {

So two functions with similar looking interfaces - both take a pair of patches and no source file, and output a new patch - and which both are "merging" patches need different algorithms that will output fundamentally different results!

And maybe both of these are potentially useful functions that should exist. But the code that currently exists in merge.js seems to be half-heartedly trying to fulfil both use cases at once, and it fundamentally can't succeed at that, because they require different algorithms that will output different results given the same inputs.

ExplodingCabbage avatar Jun 29 '24 19:06 ExplodingCabbage

Okay my new plan is to abandon this. :P

I'm not gonna close any of the issues. But I'm not gonna do more work on it either. I don't have a good grasp of the use cases in which people want to merge patches, and I now realise it's not actually one potential feature, but (at least) six potential features in a trenchcoat, none of which are simple either algorithmically or in terms of API design. I welcome others taking a stab at this, but I don't personally want to sink time into it; apologies to anyone who saw me suggest I would and got excited!

ExplodingCabbage avatar Jun 29 '24 19:06 ExplodingCabbage

My scenario wasn't that difficult. In my case, it was used for racing save operations.

Assume you opened a source file, and made a few fixes, and forgot to save. Then in a different text editor, you fixed another thing, and saved it. When you go back to that first editor, you realize you need all those unsaved things, so you press save. If it allowed it to be saved, it would clobber the edits you did in that other editor.

To handle it, I set base to the state of the file when the editor initialized its content. That state is in both histories - you know that the other editor with the racing edit was identical to the base at some point in its history. So you give the initial editor content as base, you give the content you tried to save as mine, and you give the conflicting content the other editor saved as theirs. If there is no conflict, you end up bringing their changes into the file you are attempting to save, then you start over with content that includes the other editor's changes, now only your changes that you forgot to save are different, and it saves safely.

On Sat, Jun 29, 2024 at 3:55 PM Mark Amery @.***> wrote:

Okay my new plan is to abandon this. :P

I'm not gonna close any of the issues. But I'm not gonna do more work on it either. I don't have a good grasp of the use cases in which people want to merge patches, and I now realise it's not actually one potential feature, but (at least) six potential features in a trenchcoat, none of which are simple either algorithmically or in terms of API design. I welcome others taking a stab at this, but I don't personally want to sink time into it; apologies to anyone who saw me suggest I would and got excited!

— Reply to this email directly, view it on GitHub https://github.com/kpdecker/jsdiff/issues/181#issuecomment-2198321694, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA23YELXNYXJ4Z5CP4I3WPTZJ4GKJAVCNFSM6AAAAABJ2C6DAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGMZDCNRZGQ . You are receiving this because you commented.Message ID: @.***>

doug65536 avatar Jun 30 '24 14:06 doug65536