syrupy RecursionError raised by _fancy_replace for a large snapshot

Describe the bug

_fancy_replace will encounter an error when analyzing a string that is large, repetitive, and slightly different from the expected snapshot.

To reproduce

Create this test:

CHANGED = False
def test_big_snapshot(
    snapshot,
):
    if CHANGED:
        result = [str(x) + ("a" * 20) for x in range(3000)]
    else:
        result = [str(x) + ("a" * 20 + "b") for x in range(3000)]

    assert result == snapshot

Run --snapshot-update with CHANGED = False
Run tests with CHANGED = True
See error

(repeats)
../lib/python3.8/difflib.py:1015: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
    yield from g
../lib/python3.8/difflib.py:1015: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
    yield from g
../lib/python3.8/difflib.py:1015: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
    yield from g
../lib/python3.8/difflib.py:969: in _fancy_replace
    cruncher.ratio() > best_ratio:
../lib/python3.8/difflib.py:644: in ratio
    matches = sum(triple[-1] for triple in self.get_matching_blocks())
../lib/python3.8/difflib.py:479: in get_matching_blocks
    i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
../lib/python3.8/difflib.py:444: in find_longest_match
    return Match(besti, bestj, bestsize)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

_cls = <class 'difflib.Match'>, a = 0, b = 0, size = 26

>   ???
E   RecursionError: maximum recursion depth exceeded while calling a Python object

Expected behavior

No error should be raised and the test should fail with a reasonable message. It would be fine to not have the full fancy error message for large snapshots.

Environment (please complete the following information):

OS: Windows
Syrupy Version: 1.5.0
Python Version: 3.8

Additional context

The real world use case was a 3000-row DataFrame converted to csv which had one column that was the same string for every row. When the string changed by 1 character the test appeared to hang (took more than an hour).

Jan 14 '22 01:01 Tyler-Gelvin

Hmm, we're using difflib which is built-in. I suppose we could check the length and fallback to an equality check. We'd need to figure out a max/reasonable upper limit.

Same issue from 2006: https://bugs.launchpad.net/bzr/+bug/6373
and 2014: https://bugs.python.org/issue21253

Jan 14 '22 01:01 noahnu

You say a 3000-row dataframe, it should be the column count that's the problem. We operate line by line.

A workaround would be to split up the lines prior to snapshotting the data. Open to ideas.

Jan 14 '22 01:01 noahnu

We only have 5 columns:

  0,timeseries_2,2019-01-01 00:00:00,7.0,AlwaysSeven
  1,timeseries_2,2019-01-01 00:15:00,7.0,AlwaysSeven
  ...

and the change was to "always_seven".

Also changing the constant from 3000 to 3 in the above test makes it behave as expected (failing the comparison with no recursion error).

Jan 14 '22 02:01 Tyler-Gelvin

The difflib bug has been open for 15 years. Surely it will be fixed any day now?

As for work arounds, maybe the comparison could go through the list comparing line by line first and only call difflib if there are <N rows that are different. Or maybe truncate and call difflib once N rows are different. (N maybe 20?)

Jan 14 '22 03:01 Tyler-Gelvin

I think that makes sense. Possibly some performance concerns but we can profile.

Are you interested in contributing and putting up a PR? Otherwise I will try get to it in the next few days.

Jan 14 '22 04:01 noahnu

I can give it a shot.

Jan 18 '22 07:01 Tyler-Gelvin

https://github.com/python/cpython/issues/65452

Dec 30 '22 17:12 noahnu

:tada: This issue has been resolved in version 4.0.7 :tada:

The release is available on:

v4.0.7
GitHub release

Your semantic-release bot :package::rocket:

Jul 20 '23 12:07 tophat-opensource-bot

syrupy syrupy copied to clipboard

RecursionError raised by _fancy_replace for a large snapshot

syrupy
syrupy copied to clipboard