comparePlus icon indicating copy to clipboard operation
comparePlus copied to clipboard

Mismatched lines

Open GitsHubber opened this issue 2 years ago • 19 comments

Lines from one document seem to be matched/compared to lines from another that they shouldn't compared to. That is to say that there are lines which are a far better/closer match. Also sometimes they are matched/compared to lines which are not much of a match, and infact have no equivalent line in the other document.

Perhaps better explained with a screenshot example.

https://images2.imgbox.com/30/7e/3tNRGzNT_o.png

GitsHubber avatar Mar 29 '23 17:03 GitsHubber

I think this is also related.. just a simple text compare:

image

thereddestdog avatar Apr 03 '23 14:04 thereddestdog

@GitsHubber ,

Could you please show a screenshot of the same compare but without hiding the matched lines (that is without Show Only Diffs option)? Could you also restore the default coloring just for the screenshot? I suspect that in between the diff sections you also have matched sections and the changed lines are looked for and marked only between the corresponding diff sections. So even if there is a better match for your changed line elsewhere in the file it depends on the diff context (the positioning of the line in the file corresponding to the other one). This screenshot should make things more clear:

changed_lines

Here line 1 is better matched to line 3 and vice versa but because of line 2 those are separated and cannot be linked. The lines order between files has also changed.

@thereddestdog ,

You example looks good to me, what do you consider wrong there?

BR

pnedev avatar Apr 04 '23 09:04 pnedev

@pnedev I think the extra inserted lines are not correct. I think it should look like this:

image

Thank you for a great plugin!

thereddestdog avatar Apr 05 '23 17:04 thereddestdog

@thereddestdog ,

Thank you for the feedback. You are right - old Compare plugin and the new ComparePlus plugin use different approaches when finding changed lines. There is no one universal solution and criteria on what is the right outcome (it depends on the text itself) - it is more intuitive for us people and we also have certain expectations.

For example look at these screenshots that illustrate the different approaches in another case:

compareplus110 compare202

For your example ComparePlus gives a "right" result but it kind-of stresses out that lines 10-11 are moved before lines 8-9 (so lines 8-9 and 10-11 have changed places) and are also changed in the beginning - R1, R2 are the diffs and the rest of the lines contents is the match context (sort-of-speak).

Compare plugin also gives a "right" result claiming that no lines have changed places (lines order is the same) only the end of the lines are different (cu and op210) and the rest is the match context (sort-of-speak).

Which result is better cannot be determined with certainty - it depends on the context and on which difference matters most.

Here is an example with your files data but slightly modified:

compareplus110 compare202

BR

pnedev avatar Apr 06 '23 12:04 pnedev

Hello Pavel,

We've discussed this issue before. ComparePlus is absolutely brilliant, and as you wrote: "Which result is better cannot be determined with certainty - it depends on the context and on which difference matters most.".

Allow me to seize the opportunity and bring another example from my recent real-life Compare.

Could you please compare these files? Go to case IDC_COL_INITNUM_EDIT.

Result:

תמונה

I've tried several on-line Compare sites, and most indeed display that section as CP does. But I've come across https://onlinetextcompare.com/ whose display seems clearer in this case.

תמונה

I do understand the issue is quite complex, and ComparePlus' approach is certainly much better in many cases. So, this is just another example of this issue. :)

Thank you very much for your ongoing work. Much appreciated. 👍

BR

Yaron10 avatar Apr 08 '23 14:04 Yaron10

Hello my friend Yaron,

I appreciate your feedback as always. Detecting the most "appealing" to us changed lines pairs is an ambiguous task indeed. And I am still not totally happy with its implementation as well.

Great example! 👍 I have tried to make ComparePlus more customizable and to be able to accept "guidance" from us as to what we would like to stress in a comparison. ComparePlus is by default punctual and pays attention to the empty lines as part of the compare context so in your example files to achieve the same result as https://onlinetextcompare.com/ you should set the Min line resemblance to mark as changed in the plugin Settings to 20% (or less) and also to use the Ignore Empty Lines option. That way you as a user decide to exclude the empty lines as something important and remove them from the compare context so the plugin tries to "look for differences" that are more important to you in that particular compare.

Happy holidays! BR, Pavel

pnedev avatar Apr 08 '23 21:04 pnedev

Hello Pavel,

I hope you had a good day.

And I am still not totally happy with its implementation as well.

I am very happy! :) And obviously - like in every field in life, there's always room for some improvements.

I've mentioned it several times before: just analyzing the various possible scenarios and the ultimate desired result is a major task. Implementing it is a mammoth project. I truly appreciate and admire your work.

It's been a few years now, but I have never fully and thoroughly analyzed the best Compare results. Please remember that when reading the following comment. :)

I set Min line resemblance to mark as changed to 20% a long time ago. Checking Ignore Empty Lines, the case IDC_COL_INITNUM_EDIT section is indeed displayed very clearly. The "ultimate desired result" in this case. 👍

Could you please look at two more sections?

1. case IDC_COL_DEC_RADIO.

תמונה

תמונה

2. case CBN_SELCHANGE.

תמונה

תמונה

Thank you very much and best wishes.

Yaron10 avatar Apr 09 '23 19:04 Yaron10

Hello again Yaron,

Thank you. You are right about the new examples in your last post - I noticed them as well and they seem "sub-optimal" but I need to analyze them thoroughly and see if they could be improved without compromising other compare cases. I currently don't have the time to do so. They are now expected because of the } break; constructs that give recognizable match compare context.

I will write back if I come with some improvement on this.

BR

pnedev avatar Apr 10 '23 11:04 pnedev

Hello Pavel,

Whenever you find time to look into it and possibly revise some elements in the current approach, it would be highly appreciated.

Analyzing only the few examples in this thread - The ultimate goal is letting the user grasp the diffs as quickly, easily and clearly as possible. Inserting new (alignment) lines, and thus creating "steps" - is a major slowing/confusing factor. It seems that NOT creating "steps" whenever possible would be much better.


תמונה

vs.

תמונה

I do understand the logic and advantage in the first display, but the second one is simpler more easily "grasped".


תמונה

The logic of this smart detection and display is obvious. But I'm still wondering if it's worth the confusing "steps". Perhaps even here marking lines 2 & 4 as "not-equal" and lines 3 as "added/removed" should be better.


תמונה

Again, the logic of this smart detection and display is obvious. But is it worth extra-lines/steps?


Bottom line: It's quite complex.

Thanks again for your time and hard work. 👍

Yaron10 avatar Apr 10 '23 13:04 Yaron10

Hi Yaron,

Thanks again for the feedback and your thoughts on it, I appreciate it. I'll think about a way to improve this and make the changed detection decision even "smarter" but I won't have much free time in the near future so whenever I get to it. I'll currently focus mainly on fixing major issues with ComparePlus although there are some really good and useful feature requests and I also have some ideas for additional functionalities.

BR

pnedev avatar Apr 11 '23 10:04 pnedev

Hello Pavel,

Whenever you have free time it'll be appreciated.

I'll currently focus mainly on fixing major issues with ComparePlus

What "major issues"? :) Do you mean this thread? I don't mean to rush you whatsoever - but the more I think about it, the more important it seems.

Allow me a few more points regarding the current issue. I know that suggesting is much easier than implementing. Thank you for your patience.


תמונה

I've mentioned the extra-lines/steps as a potential slowing/confusing factor. That is: the user has to analyze 6 lines displayed in "steps" instead of "straight" 4.

Another factor is the lines numbers.

  1. We don't play with shuffled cards. The fact that two lines, although different, are positioned in line (e.g.) 8 in both files is important, and should (IMO) be taken into account by the "smart detection".

  2. Keeping lines 8 aligned should also make the user grasp the diffs more quickly.


The different lines background colors is also a major slowing factor. That is: the "smart detection" should prefer to flow with the same block-background-color when possible.

תמונה


although there are some really good and useful feature requests and I also have some ideas for additional functionalities.

👍 I'll be looking forward to them. Thank you so much. I do appreciate it.

Yaron10 avatar Apr 11 '23 16:04 Yaron10

Hello Yaron,

What "major issues"? :)

When such issues appear, that's what I meant ;) https://github.com/pnedev/comparePlus/issues/350 was one such issue - good catch :+1:

I've mentioned the extra-lines/steps as a potential slowing/confusing factor. That is: the user has to analyze 6 lines displayed in "steps" instead of "straight" 4.

Well, I think that nevertheless that's the right approach. In the specific example given by @thereddestdog both of you are right that it will be better to align the lines and have one group of four changed lines. But more generally (and algorithmically speaking) it is better to find the most appropriate changed lines candidates and that will inevitably separate and displace the corresponding blocks. That way for example if you have a large compared code section that differs and is comprised of totally new code parts (that inevitably has the same semantic language constructs) and a slightly altered existing code part the user will quickly grasp that there is an existing slightly changed part (perhaps a commented section) and a totally new functionality block. Very often the corresponding added/removed blocks pair have different added/removed portion sizes as well.

The different lines background colors is also a major slowing factor.

Are you suggesting to change the colorization so there are no yellow portions but only green/red ones with highlighted and aligned changed portions/lines (just like the Github compare and https://onlinetextcompare.com/) ? I also think that might be a better display IDK for sure.

Thank you for thinking about how to further optimize and make existing CP functionalities even better and more intuitive. I appreciate that :+1:

BR

pnedev avatar Apr 12 '23 12:04 pnedev

Hello Pavel,

When such issues appear, that's what I meant.

Much appreciated. :)

Well, I think that nevertheless that's the right approach. ... But more generally (and algorithmically speaking) it is better to find the most appropriate changed lines candidates and that will inevitably separate and displace the corresponding blocks.

I trust and highly respect your judgement.

Still, please allow me to ask: are you also referring to the case IDC_COL_INITNUM_EDIT, case IDC_COL_DEC_RADIO & case CBN_SELCHANGE sections? IMO it's quite obvious that aligning those blocks should be much clearer for the user. You also labeled them as "sub-optimal".

If changing the approach and implementation is complex, I'd certainly understand and respect that too.

Are you suggesting to change the colorization so there are no yellow portions but only green/red ones with highlighted and aligned changed portions/lines?

Not categorically. I wanted to emphasize that different background colors can also be visually distracting, and that's another reason to align blocks like case IDC_COL_INITNUM_EDIT.

I had another case in mind (possibly more complex; please ignore if it is). Could you compare these files?

תמונה

Even if the label="Accounts" lines should be marked as "changed", it might be better (in this case) to mark them as "added/removed" so that the we have only two background colors in the block.


You wrote you didn't have free time these days, and here we are engaging in complex discussions. :) Please let me know if you'd rather get back to it some time in the future.

Thank you very much once again. 👍

BR

Yaron10 avatar Apr 13 '23 14:04 Yaron10

Hello Yaron,

You wrote you didn't have free time these days, ....... Please let me know if you'd rather get back to it some time in the future.

Its OK, I don't mind discussing how to further improve ComparePlus, it's a pleasure :) I just won't have much time soon to implement new, complex features or changes.

Still, please allow me to ask: are you also referring to the case IDC_COL_INITNUM_EDIT, case IDC_COL_DEC_RADIO & case CBN_SELCHANGE sections?

IDC_COL_INITNUM_EDIT is OK as it is now because if we ignore the empty lines the compare context changes and the comparison is as expected. As to IDC_COL_DEC_RADIO and CBN_SELCHANGE - those are compared OK now (by that I mean algorithmically it is expected and accurate) but from user perspective it is not optimal for sure and there is currently no way to "guide" the compare process to the desired compare context. Changing that will be complex but I'll think about making things better (when I find the time and energy).

I wanted to emphasize that different background colors can also be visually distracting...

Yes, I agree. I am not sure though that keeping the background green/red only will be better. Maybe it is worth trying and experimenting with it.

Even if the label="Accounts" lines should be marked as "changed", it might be better (in this case) to mark them as "added/removed" so that the we have only two background colors in the block.

Unfortunately I cannot do much about that. The only possibility would be to provide a way for the user to disable changed lines matching process. Or to use green/red background for changed lines as mentioned above.

Thank you for the feedback and sharing your thoughts about change detection implementation. :+1:

BR

pnedev avatar Apr 14 '23 14:04 pnedev

Hello Pavel,

Its OK, I don't mind discussing how to further improve ComparePlus, it's a pleasure :)

Great. It's a pleasure for me as well.

but from user perspective it is not optimal for sure and there is currently no way to "guide" the compare process to the desired compare context. Changing that will be complex but I'll think about making things better (when I find the time and energy).

👍 Time, energy and mood. I do appreciate it.

I am not sure though that keeping the background green/red only will be better.

I may not have been clear about that, but I did not suggest it categorically. On the contrary, the "tricolor" in general is useful and helpful.

Unfortunately I cannot do much about that...

That point was not as important.

Thank you. 👍 I wish you the best. - Energy and mood. :)

Yaron10 avatar Apr 14 '23 14:04 Yaron10

Hello Yaron,

I have "optimized" compare diffs in the same dev binaries pointed in https://github.com/pnedev/comparePlus/issues/353#issuecomment-1513139107. For convenience I have pointed to the same dev binaries here: win32, win64 Now if you ignore the empty lines you would get the same result with your example files as when using https://onlinetextcompare.com/. That was a very helpful feedback, thank you for sharing it :+1: I haven't come across such compare example before. I probably still need to do some optimizations (in speed mainly, the need for those might not be user-noticeable at the moment) but the dev binaries are perfectly usable as they are now.

BR

pnedev avatar Apr 18 '23 13:04 pnedev

Hello Pavel,

Thank you for this major improvement. Both the columnEditor pair and the Chrome pair look "more correct", and the diffs are much easier to grasp. 👍

I appreciate your time and great work.

If you currently don't have time (or energy) for further discussing it, would you mind keeping this issue open?

Before the recent changes, the following section was "correctly" aligned even if Ignore Empty Lines was unchecked.

תמונה

With the recent commit:

תמונה

Don't you think the first results are better regardless of Ignore Empty Lines? It's like pulling a short blanket. :)


I probably still need to do some optimizations (in speed mainly, the need for those might not be user-noticeable at the moment) but the dev binaries are perfectly usable as they are now.

Thank you for that too and best of luck.

Yaron10 avatar Apr 18 '23 18:04 Yaron10

Hello Yaron,

Thank you. I intend to keep this thread open for some time.

Don't you think the first results are better regardless of Ignore Empty Lines?

They are better for sure, you are right. But I need to analyze the compare algorithm thoroughly to check if and how something can be further improved.

BR

pnedev avatar Apr 18 '23 21:04 pnedev

Hello Pavel,

I know that analyzing the compare algorithm should require a lot of and work. Whenever you find the time and energy, It would be highly appreciated. 👍

BR

Yaron10 avatar Apr 19 '23 11:04 Yaron10