jj icon indicating copy to clipboard operation
jj copied to clipboard

diff: stricter uncommon lcs, match up leading/trailing ranges instead

Open yuja opened this issue 1 year ago • 2 comments

The last patch is copied from #763 with a few modifications.

Checklist

If applicable:

  • [ ] I have updated CHANGELOG.md
  • [ ] I have updated the documentation (README.md, docs/, demos/)
  • [ ] I have updated the config schema (cli/src/config-schema.json)
  • [x] I have added tests to cover my changes

yuja avatar Jul 02 '24 15:07 yuja

I haven't looked in-depth into #763 to know if this is a good change, but in any case, if we want this change, we should add a test case that tests the case where a diff result contains both the results of the leading/trailing code and the LCS code. As it is, all tests pass with this diff:

diff --git a/lib/src/diff.rs b/lib/src/diff.rs
index 57fb9cc2..af19cdb1 100644
--- a/lib/src/diff.rs
+++ b/lib/src/diff.rs
@@ -177,7 +177,8 @@ pub(crate) fn unchanged_ranges(
     if common_leading_len > 0 {
         let (left_leading_ranges, left_ranges) = left_ranges.split_at(common_leading_len);
         let (right_leading_ranges, right_ranges) = right_ranges.split_at(common_leading_len);
-        let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+        // let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+        let mut result = vec![];
         result.splice(
             0..0,
             iter::zip(
@@ -197,7 +198,8 @@ pub(crate) fn unchanged_ranges(
             left_ranges.split_at(left_ranges.len() - common_trailing_len);
         let (right_ranges, right_trailing_ranges) =
             right_ranges.split_at(right_ranges.len() - common_trailing_len);
-        let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+        // let mut result = unchanged_ranges_lcs(left, right, left_ranges, right_ranges);
+        let mut result = vec![];
         result.extend(iter::zip(
             left_trailing_ranges.iter().cloned(),
             right_trailing_ranges.iter().cloned(),

The non-test code looks reasonable. Thanks for separating out the logical steps into their own commits.

jonathantanmy avatar Jul 02 '24 21:07 jonathantanmy

add a test case that tests the case where a diff result contains both the results of the leading/trailing code and the LCS code. As it is, all tests pass with this diff:

Good point. Actually there was a bug that leading/trailing matches weren't trimmed from both ends, and that was the reason we had recursion there, I think. I've added a patch that removes recursion.

yuja avatar Jul 03 '24 04:07 yuja

OK, removing the recursion makes sense and addresses my concern above.

Let me summarize my current thoughts for other reviewers:

  • This PR makes LCS more strict by requiring that if we use uncommon words, they must appear the same number of times in the LHS and RHS. But now, a diff between "a a a" and "a a" will have no matches, so this PR adds the matching prefix/suffix rule to compensate.
  • If we believe that we need the same-number-of-times rule, then this PR is good.
  • I'm not sure whether we really need the same-number-of-times rule.

So I would say that this PR is good from a code standpoint, but I'm not sure whether it's good from a feature standpoint. Maybe someone from #763 could review this PR.

jonathantanmy avatar Jul 08 '24 21:07 jonathantanmy

  • This PR makes LCS more strict by requiring that if we use uncommon words, they must appear the same number of times in the LHS and RHS. But now, a diff between "a a a" and "a a" will have no matches, so this PR adds the matching prefix/suffix rule to compensate.

  • If we believe that we need the same-number-of-times rule, then this PR is good.

  • I'm not sure whether we really need the same-number-of-times rule.

Thanks for summarizing it.

To add some context, I think the current implementation is a bit weird in that we choose the first min(left, right) occurrences. There's no clue that the first N matching pairs should line up, not the last N. Practically, this change tends to disable pairing unbalanced whitespace characters in color-words diffs. The resulting diff looks more readable to me.

yuja avatar Jul 09 '24 03:07 yuja

I think the result is good so long as "uncommon LCS" matching is attempted first.

What happens if we don't?

could produce inferior diffs. If I understand it correctly, Git roughly does LCS -> widen -> LCS ..., Bzr does LCS -> LCS -> ... -> widen.

image instead of image

yuja avatar Jul 09 '24 09:07 yuja