diff-match-patch
diff-match-patch copied to clipboard
patch_fromText and lines breaks issue in Python
Hello,
I am looking to utilize this fantastic library for applying patches to a file, but I'm encountering issues with broken line breaks. I have attempted to use the patch_make function to compare the behavior, and it operates effectively when the diff is generated by dmp. Could someone please guide me in identifying the potential issues in my diff file?
The wrong result :
"express": "^4.18.2", "lodash": "^4.17.21",
The expected one :
"express": "^4.18.2",
"lodash": "^4.17.21",
This is my simple test code :
from diff_match_patch import diff_match_patch
original_json = '''{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2",
"puppeteer": "^21.7.0"
}
}'''
expected_json = '''{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2",
"lodash": "^4.17.21",
"puppeteer": "^21.7.0"
}
}'''
# The unidiff string, which is an exact clone of what dmp generates for the same changes
diff_str = '''@@ -82,16 +82,42 @@
.18.2",
+ "lodash": "^4.17.21",
"pup'''
print("____________ BROKEN EXAMPLE")
dmp = diff_match_patch()
patches = dmp.patch_fromText(diff_str)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)
print("____________ WORKING EXAMPLE")
patches = dmp.patch_make(original_json, expected_json)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)
And the result
____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
.18.2%22,
+ %22lodash%22: %22%5E4.17.21%22,
%22pup
{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2", "lodash": "^4.17.21",
"puppeteer": "^21.7.0"
}
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
.18.2%22,%0A
+ %22lodash%22: %22%5E4.17.21%22,%0A
%22pup
{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2",
"lodash": "^4.17.21",
"puppeteer": "^21.7.0"
}
}
We can clearely see that dmp adds a "%0A" (an encoded \n) in the generated patch, which is missing when creating it from text.
- I conducted various tests on multiple patch formats.
- Upon inspecting the core of the DMP, I discovered that the final aggregated text does not include line breaks in the diffs.
Version info :
diff-match-patch 20230430
system Ubuntu
python 3.10.12
@weeger did you confirm if this works as expected in the other languages? in JavaScript, Java, Objective C, or any others?
Same problem, just tested in Javascript :
// Import the diff-match-patch library
const DiffMatchPatch = require('diff-match-patch');
const dmp = new DiffMatchPatch();
// Original JSON string
const originalJson = `{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2",
"puppeteer": "^21.7.0"
}
}`;
// Expected JSON string with a new dependency added
const expectedJson = `{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2",
"lodash": "^4.17.21",
"puppeteer": "^21.7.0"
}
}`;
// Unidiff string representing the change
const diffStr = '@@ -82,16 +82,42 @@\n .18.2",\n+ "lodash": "^4.17.21",\n "pup';
console.log("____________ BROKEN EXAMPLE");
// Converting the diff string to patches
let patches = dmp.patch_fromText(diffStr);
// Converting patches back to a diff string for display
let patchText = dmp.patch_toText(patches);
// Applying patches to the original JSON string
let [newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);
console.log("____________ WORKING EXAMPLE");
// Creating patches directly from original and expected JSON strings
patches = dmp.patch_make(originalJson, expectedJson);
// Converting patches to a diff string for display
patchText = dmp.patch_toText(patches);
// Applying these patches to the original JSON
[newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);
Result
____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
.18.2%22,
+ %22lodash%22: %22%5E4.17.21%22,
%22pup
{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2",
"lodash": "^4.17.21", "puppeteer": "^21.7.0"
}
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
.18.2%22,%0A
+ %22lodash%22: %22%5E4.17.21%22,%0A
%22pup
{
"type": "module",
"dependencies": {
"ejs": "^3.1.9",
"express": "^4.18.2",
"lodash": "^4.17.21",
"puppeteer": "^21.7.0"
}
}
there one thing that stands out: in the broken example we have a patch file where the newline is percent-encoded as %0A
. it's becoming part of the diff, but the patch tool already works on the line-level.
makes me wonder if there's something about the diff
output that patch
takes that makes an assumption about a single trailing newline vs. many. as in, maybe diff-match-patch
should strip a single trailing newline from its output before percent-encoding it.
@dmsnell I just encountered this issue myself. This bug is troublesome. Do you plan to fix it and publish the fix in your improved fork? ( https://github.com/dmsnell/diff-match-patch )? That would be great. Can't believe Google abandoned this project... 😔
@Emasoft it's doubtful I'll fix this myself on any timeline someone else wants, but I'm willing to work with people who propose fixes and get it merged. I don't have the capacity do everything myself.
Lang