jsdiff
jsdiff copied to clipboard
Adding "merge" flag to all diffMethods improve diff readability
I'd like to suggest adding a "merge" option to the default diff methods to make diffs more readable/useful. Here's a diffWords & diffChars example visualizing current behavior vs my proposed addition:
With the flag, additions and removals are now "merged" and separated by unchanged parts. Here's the code used to build these examples
function coloredSpan(text: string, color: string) {
return `<${color === "red" ? "s" : "span"} style="color:${color}">${text}</${color === "red" ? "s" : "span"}>`;
}
let outputString = "";
let changes = Diff.diffWords(oldState, newState);
if (shouldMerge) {
changes = mergeChanges(changes)
}
changes.forEach((part: any) => {
outputString += coloredSpan(part.value, part.added ? "green" : part.removed ? "red" : "black");
});
And the code to build the merged diff array
function mergeChanges(changes: Diff.Change[]) {
// create accumulators for the added and removed text. Once a neutral part is encountered, merge the diffs and reset the accumulators
let addedText = "";
let addedCount = 0;
let removedText = "";
let removedCount = 0;
let mergedChanges = [];
for (const part of changes) {
if (part?.added) {
addedText += part.value;
addedCount += part.count ?? 0;
} else if (part?.removed) {
removedText += part.value;
removedCount += part.count ?? 0;
} else if (part.value.length <= 5) {
// we ignore small unchanged segments (<= 4 characters), which catches most whitespace too
addedText += part.value;
removedText += part.value;
} else {
// if the part is not added or removed, merge the added and removed text and append to the diff alongside neutral text
mergedChanges.push({ value: removedText, removed: true, count: removedCount });
mergedChanges.push({ value: addedText, added: true, count: addedCount });
mergedChanges.push(part);
addedText = "";
addedCount = 0;
removedText = "";
removedCount = 0;
}
}
// after exiting the loop we might have ended with some added or removed text that needs to be appended
if (addedText) {
mergedChanges.push({ value: addedText, added: true, count: addedCount });
}
if (removedText) {
mergedChanges.push({ value: removedText, removed: true, count: removedCount });
}
return mergedChanges;
}
Notes: The current implementation is for discussion, I realize iterating the diff array again & making a copy are suboptimal. Potentially a lower-level change, i.e. adding a new algorithm for the diffs depending on the flag would be the best long term decision.
The conditions under which unchanged separators are ignored would also need to be reconsidered, i.e. the <=5 doesn't make sense diffChars.
If this is already supported without extending the base class happy to be pointed in that direction! Or if this feature is already under consideration.
Hmm. The specific case of diffWords should tend to give much more readable results these days since spaces are no longer treated as tokens. Is there a compelling case for this feature besides that?
Hmm. The specific case of
diffWordsshould tend to give much more readable results these days since spaces are no longer treated as tokens. Is there a compelling case for this feature besides that?
Without playing around & making more examples I'm less able to give a definitive y/n, but if the only change is spaces I'd still imagine so. Short/single words like of or the would probably still break up diff chunks that might be more clear visually if they were grouped together. That would be up to a user to toggle, i.e. a "min diff size" concept. It just depends on the use case. For me readability is more important than knowing exactly which words might have been removed