diff-match-patch icon indicating copy to clipboard operation
diff-match-patch copied to clipboard

patch_fromText and lines breaks issue in Python

Open weeger opened this issue 10 months ago • 6 comments

Hello,

I am looking to utilize this fantastic library for applying patches to a file, but I'm encountering issues with broken line breaks. I have attempted to use the patch_make function to compare the behavior, and it operates effectively when the diff is generated by dmp. Could someone please guide me in identifying the potential issues in my diff file?

The wrong result :

"express": "^4.18.2",    "lodash": "^4.17.21",

The expected one :

 "express": "^4.18.2",
 "lodash": "^4.17.21",

This is my simple test code :

from diff_match_patch import diff_match_patch

original_json = '''{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "puppeteer": "^21.7.0"
  }
}'''

expected_json = '''{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}'''

# The unidiff string, which is an exact clone of what dmp generates for the same changes
diff_str = '''@@ -82,16 +82,42 @@
 .18.2",
+    "lodash": "^4.17.21",
     "pup'''

print("____________ BROKEN EXAMPLE")
dmp = diff_match_patch()
patches = dmp.patch_fromText(diff_str)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)

print("____________ WORKING EXAMPLE")
patches = dmp.patch_make(original_json, expected_json)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)

And the result

____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,
+    %22lodash%22: %22%5E4.17.21%22,
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,%0A
+    %22lodash%22: %22%5E4.17.21%22,%0A
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}

We can clearely see that dmp adds a "%0A" (an encoded \n) in the generated patch, which is missing when creating it from text.

  • I conducted various tests on multiple patch formats.
  • Upon inspecting the core of the DMP, I discovered that the final aggregated text does not include line breaks in the diffs.

Version info :

diff-match-patch 20230430
system Ubuntu
python 3.10.12

weeger avatar Apr 09 '24 06:04 weeger

@weeger did you confirm if this works as expected in the other languages? in JavaScript, Java, Objective C, or any others?

dmsnell avatar Apr 09 '24 15:04 dmsnell

Same problem, just tested in Javascript :

// Import the diff-match-patch library
const DiffMatchPatch = require('diff-match-patch');
const dmp = new DiffMatchPatch();

// Original JSON string
const originalJson = `{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "puppeteer": "^21.7.0"
  }
}`;

// Expected JSON string with a new dependency added
const expectedJson = `{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}`;

// Unidiff string representing the change
const diffStr = '@@ -82,16 +82,42 @@\n .18.2",\n+    "lodash": "^4.17.21",\n     "pup';

console.log("____________ BROKEN EXAMPLE");
// Converting the diff string to patches
let patches = dmp.patch_fromText(diffStr);
// Converting patches back to a diff string for display
let patchText = dmp.patch_toText(patches);
// Applying patches to the original JSON string
let [newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);

console.log("____________ WORKING EXAMPLE");
// Creating patches directly from original and expected JSON strings
patches = dmp.patch_make(originalJson, expectedJson);
// Converting patches to a diff string for display
patchText = dmp.patch_toText(patches);
// Applying these patches to the original JSON
[newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);

Result

____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,
+    %22lodash%22: %22%5E4.17.21%22,
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",    "puppeteer": "^21.7.0"
  }
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,%0A
+    %22lodash%22: %22%5E4.17.21%22,%0A
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}

weeger avatar Apr 13 '24 11:04 weeger

there one thing that stands out: in the broken example we have a patch file where the newline is percent-encoded as %0A. it's becoming part of the diff, but the patch tool already works on the line-level.

makes me wonder if there's something about the diff output that patch takes that makes an assumption about a single trailing newline vs. many. as in, maybe diff-match-patch should strip a single trailing newline from its output before percent-encoding it.

dmsnell avatar Apr 24 '24 07:04 dmsnell

@dmsnell I just encountered this issue myself. This bug is troublesome. Do you plan to fix it and publish the fix in your improved fork? ( https://github.com/dmsnell/diff-match-patch )? That would be great. Can't believe Google abandoned this project... 😔

Emasoft avatar Apr 28 '24 16:04 Emasoft

@Emasoft it's doubtful I'll fix this myself on any timeline someone else wants, but I'm willing to work with people who propose fixes and get it merged. I don't have the capacity do everything myself.

dmsnell avatar Apr 29 '24 09:04 dmsnell

Lang

17351269 avatar May 28 '24 13:05 17351269