diff: stricter uncommon lcs, match up leading/trailing ranges instead
The last patch is copied from #763 with a few modifications.
Checklist
If applicable:
- [ ] I have updated
CHANGELOG.md - [ ] I have updated the documentation (README.md, docs/, demos/)
- [ ] I have updated the config schema (cli/src/config-schema.json)
- [x] I have added tests to cover my changes
I haven't looked in-depth into #763 to know if this is a good change, but in any case, if we want this change, we should add a test case that tests the case where a diff result contains both the results of the leading/trailing code and the LCS code. As it is, all tests pass with this diff:
diff --git a/lib/src/diff.rs b/lib/src/diff.rs
index 57fb9cc2..af19cdb1 100644
--- a/lib/src/diff.rs
+++ b/lib/src/diff.rs
@@ -177,7 +177,8 @@ pub(crate) fn unchanged_ranges(
if common_leading_len > 0 {
let (left_leading_ranges, left_ranges) = left_ranges.split_at(common_leading_len);
let (right_leading_ranges, right_ranges) = right_ranges.split_at(common_leading_len);
- let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+ // let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+ let mut result = vec![];
result.splice(
0..0,
iter::zip(
@@ -197,7 +198,8 @@ pub(crate) fn unchanged_ranges(
left_ranges.split_at(left_ranges.len() - common_trailing_len);
let (right_ranges, right_trailing_ranges) =
right_ranges.split_at(right_ranges.len() - common_trailing_len);
- let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+ // let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+ let mut result = vec![];
result.extend(iter::zip(
left_trailing_ranges.iter().cloned(),
right_trailing_ranges.iter().cloned(),
The non-test code looks reasonable. Thanks for separating out the logical steps into their own commits.
add a test case that tests the case where a diff result contains both the results of the leading/trailing code and the LCS code. As it is, all tests pass with this diff:
Good point. Actually there was a bug that leading/trailing matches weren't trimmed from both ends, and that was the reason we had recursion there, I think. I've added a patch that removes recursion.
OK, removing the recursion makes sense and addresses my concern above.
Let me summarize my current thoughts for other reviewers:
- This PR makes LCS more strict by requiring that if we use uncommon words, they must appear the same number of times in the LHS and RHS. But now, a diff between "a a a" and "a a" will have no matches, so this PR adds the matching prefix/suffix rule to compensate.
- If we believe that we need the same-number-of-times rule, then this PR is good.
- I'm not sure whether we really need the same-number-of-times rule.
So I would say that this PR is good from a code standpoint, but I'm not sure whether it's good from a feature standpoint. Maybe someone from #763 could review this PR.
This PR makes LCS more strict by requiring that if we use uncommon words, they must appear the same number of times in the LHS and RHS. But now, a diff between "a a a" and "a a" will have no matches, so this PR adds the matching prefix/suffix rule to compensate.
If we believe that we need the same-number-of-times rule, then this PR is good.
I'm not sure whether we really need the same-number-of-times rule.
Thanks for summarizing it.
To add some context, I think the current implementation is a bit weird in that we choose the first min(left, right) occurrences. There's no clue that the first N matching pairs should line up, not the last N. Practically, this change tends to disable pairing unbalanced whitespace characters in color-words diffs. The resulting diff looks more readable to me.
I think the result is good so long as "uncommon LCS" matching is attempted first.
What happens if we don't?
could produce inferior diffs. If I understand it correctly, Git roughly does LCS -> widen -> LCS ..., Bzr does LCS -> LCS -> ... -> widen.
instead of