syrupy
syrupy copied to clipboard
RecursionError raised by _fancy_replace for a large snapshot
Describe the bug
_fancy_replace will encounter an error when analyzing a string that is large, repetitive, and slightly different from the expected snapshot.
To reproduce
- Create this test:
CHANGED = False
def test_big_snapshot(
snapshot,
):
if CHANGED:
result = [str(x) + ("a" * 20) for x in range(3000)]
else:
result = [str(x) + ("a" * 20 + "b") for x in range(3000)]
assert result == snapshot
- Run --snapshot-update with CHANGED = False
- Run tests with CHANGED = True
- See error
(repeats)
../lib/python3.8/difflib.py:1015: in _fancy_replace
yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
yield from g
../lib/python3.8/difflib.py:1015: in _fancy_replace
yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
yield from g
../lib/python3.8/difflib.py:1015: in _fancy_replace
yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
yield from g
../lib/python3.8/difflib.py:969: in _fancy_replace
cruncher.ratio() > best_ratio:
../lib/python3.8/difflib.py:644: in ratio
matches = sum(triple[-1] for triple in self.get_matching_blocks())
../lib/python3.8/difflib.py:479: in get_matching_blocks
i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
../lib/python3.8/difflib.py:444: in find_longest_match
return Match(besti, bestj, bestsize)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_cls = <class 'difflib.Match'>, a = 0, b = 0, size = 26
> ???
E RecursionError: maximum recursion depth exceeded while calling a Python object
Expected behavior
No error should be raised and the test should fail with a reasonable message. It would be fine to not have the full fancy error message for large snapshots.
Environment (please complete the following information):
- OS: Windows
- Syrupy Version: 1.5.0
- Python Version: 3.8
Additional context
The real world use case was a 3000-row DataFrame converted to csv which had one column that was the same string for every row. When the string changed by 1 character the test appeared to hang (took more than an hour).
Hmm, we're using difflib which is built-in. I suppose we could check the length and fallback to an equality check. We'd need to figure out a max/reasonable upper limit.
Related:
- Same issue from 2006: https://bugs.launchpad.net/bzr/+bug/6373
- and 2014: https://bugs.python.org/issue21253
You say a 3000-row dataframe, it should be the column count that's the problem. We operate line by line.
A workaround would be to split up the lines prior to snapshotting the data. Open to ideas.
We only have 5 columns:
0,timeseries_2,2019-01-01 00:00:00,7.0,AlwaysSeven
1,timeseries_2,2019-01-01 00:15:00,7.0,AlwaysSeven
...
and the change was to "always_seven".
Also changing the constant from 3000 to 3 in the above test makes it behave as expected (failing the comparison with no recursion error).
The difflib bug has been open for 15 years. Surely it will be fixed any day now?
As for work arounds, maybe the comparison could go through the list comparing line by line first and only call difflib if there are <N rows that are different. Or maybe truncate and call difflib once N rows are different. (N maybe 20?)
I think that makes sense. Possibly some performance concerns but we can profile.
Are you interested in contributing and putting up a PR? Otherwise I will try get to it in the next few days.
I can give it a shot.
https://github.com/python/cpython/issues/65452
:tada: This issue has been resolved in version 4.0.7 :tada:
The release is available on:
-
v4.0.7
- GitHub release
Your semantic-release bot :package::rocket: