phpunit icon indicating copy to clipboard operation
phpunit copied to clipboard

PHPUnit should not create a diff when expected or actual string are too large

Open arnaud-lb opened this issue 5 years ago • 8 comments

Q A
PHPUnit version 7.x
PHP version 7.1
Installation Method Composer

When comparing values that happen to not be equal, phpunit might try to display a nice diff. If the values are large enough, a memory-efficient implementation is used to compute the diff (MemoryEfficientLongestCommonSubsequenceCalculator).

The diff algorithm can take hours to compute a diff on accidentally large strings.

arnaud-lb avatar May 10 '19 08:05 arnaud-lb

The same issue reproduced for me too

Darmaru avatar Dec 04 '19 17:12 Darmaru

Unless a bug can be found in the MemoryEfficientLongestCommonSubsequenceCalculator you can try using the TimeEfficientLongestCommonSubsequenceCalculator for these cases.

You might have better luck having this resolved by finding the original authors of the calculators and discuss the details based on you input with them, as the details in the current report(s) are not enough to work with.

SpacePossum avatar Dec 09 '19 11:12 SpacePossum

I guess that the algorithms are pretty standard, and that they work as intended. One is time efficient (but memory hungry), the other is memory efficient (and time hungry).

If that's the case, and if there is no more efficient diffing algorithm, the issue is in PHPUnit: It should not try to generate a full minimal diff when it's not going to finish in an acceptable time. It could fallback to just finding the first difference in two inputs (which would not require to find the longuest common substring), or to use an algorithm that only tries to find an approximation of the minimal possible diff.

arnaud-lb avatar Dec 09 '19 16:12 arnaud-lb

If that's the case, and if there is no more efficient diffing algorithm, the issue is in PHPUnit:

I didn't check if PHPUnit has some logic to select a calculator, or leaves it to this package; @ see: https://github.com/sebastianbergmann/diff/blob/3.0.2/src/Differ.php#L193

It might be worth checking if this selection logic is working as intended. Maybe someone who is effected by this can check?

SpacePossum avatar Dec 17 '19 10:12 SpacePossum

I have the same issue, Can we use the yetanotherape/diff-match-patch package to replace the original diff algorithm.

cxlblm avatar Oct 24 '20 06:10 cxlblm

Did you try to make diff with yetanotherape/diff-match-patch , did work considerable faster? Do you maybe have some test scenario/data to see the difference?

SpacePossum avatar Oct 26 '20 08:10 SpacePossum

I've created a reproducer / demo here: https://github.com/sebastianbergmann/diff/pull/107

I've also tried yetanotherape/diff-match-patch: Unfortunately it uses too much memory.

arnaud-lb avatar Mar 08 '22 12:03 arnaud-lb

I've been working towards a solution. Rather than removing the diff, I think it would be nice to provide a hint about what's going wrong.

To do this, we could implement a method "getPartialDiff" in the ComparisonFailure class (sebastian/comparator/src/ComparisonFailure.php); This method would first check that $expected and $actual are not bigger than a decided threshold. If they are, it would search for the first diff, and reduce $expected and $actual to a small number of lines around this diff. It would then call $differ->diff(...) just like the "getDiff" method, but with the reduced inputs.

phpunit should then use this method instead of "getDiff" when printing the exception (in phpunit/src/Framework/TestFailure.php:47).

Here is an example of output :

--- Expected
+++ Actual
@@ @@
-skipping 1337 lines...
    unchanged line 1
    unchanged line 2
    unchanged line 3
-   this line is missing
+   this line should not be here
    unchanged line 1
    unchanged line 2
    unchanged line 3
+skipping 42000 lines...

FlorianSara avatar Apr 15 '22 15:04 FlorianSara