syrupy icon indicating copy to clipboard operation
syrupy copied to clipboard

RecursionError raised by _fancy_replace for a large snapshot

Open Tyler-Gelvin opened this issue 3 years ago • 6 comments

Describe the bug

_fancy_replace will encounter an error when analyzing a string that is large, repetitive, and slightly different from the expected snapshot.

To reproduce

  1. Create this test:
CHANGED = False
def test_big_snapshot(
    snapshot,
):
    if CHANGED:
        result = [str(x) + ("a" * 20) for x in range(3000)]
    else:
        result = [str(x) + ("a" * 20 + "b") for x in range(3000)]

    assert result == snapshot
  1. Run --snapshot-update with CHANGED = False
  2. Run tests with CHANGED = True
  3. See error
(repeats)
../lib/python3.8/difflib.py:1015: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
    yield from g
../lib/python3.8/difflib.py:1015: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
    yield from g
../lib/python3.8/difflib.py:1015: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../lib/python3.8/difflib.py:1027: in _fancy_helper
    yield from g
../lib/python3.8/difflib.py:969: in _fancy_replace
    cruncher.ratio() > best_ratio:
../lib/python3.8/difflib.py:644: in ratio
    matches = sum(triple[-1] for triple in self.get_matching_blocks())
../lib/python3.8/difflib.py:479: in get_matching_blocks
    i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
../lib/python3.8/difflib.py:444: in find_longest_match
    return Match(besti, bestj, bestsize)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

_cls = <class 'difflib.Match'>, a = 0, b = 0, size = 26

>   ???
E   RecursionError: maximum recursion depth exceeded while calling a Python object

Expected behavior

No error should be raised and the test should fail with a reasonable message. It would be fine to not have the full fancy error message for large snapshots.

Environment (please complete the following information):

  • OS: Windows
  • Syrupy Version: 1.5.0
  • Python Version: 3.8

Additional context

The real world use case was a 3000-row DataFrame converted to csv which had one column that was the same string for every row. When the string changed by 1 character the test appeared to hang (took more than an hour).

Tyler-Gelvin avatar Jan 14 '22 01:01 Tyler-Gelvin

Hmm, we're using difflib which is built-in. I suppose we could check the length and fallback to an equality check. We'd need to figure out a max/reasonable upper limit.

Related:

  • Same issue from 2006: https://bugs.launchpad.net/bzr/+bug/6373
  • and 2014: https://bugs.python.org/issue21253

noahnu avatar Jan 14 '22 01:01 noahnu

You say a 3000-row dataframe, it should be the column count that's the problem. We operate line by line.

A workaround would be to split up the lines prior to snapshotting the data. Open to ideas.

noahnu avatar Jan 14 '22 01:01 noahnu

We only have 5 columns:

  0,timeseries_2,2019-01-01 00:00:00,7.0,AlwaysSeven
  1,timeseries_2,2019-01-01 00:15:00,7.0,AlwaysSeven
  ...

and the change was to "always_seven".

Also changing the constant from 3000 to 3 in the above test makes it behave as expected (failing the comparison with no recursion error).

Tyler-Gelvin avatar Jan 14 '22 02:01 Tyler-Gelvin

The difflib bug has been open for 15 years. Surely it will be fixed any day now?

As for work arounds, maybe the comparison could go through the list comparing line by line first and only call difflib if there are <N rows that are different. Or maybe truncate and call difflib once N rows are different. (N maybe 20?)

Tyler-Gelvin avatar Jan 14 '22 03:01 Tyler-Gelvin

I think that makes sense. Possibly some performance concerns but we can profile.

Are you interested in contributing and putting up a PR? Otherwise I will try get to it in the next few days.

noahnu avatar Jan 14 '22 04:01 noahnu

I can give it a shot.

Tyler-Gelvin avatar Jan 18 '22 07:01 Tyler-Gelvin

https://github.com/python/cpython/issues/65452

noahnu avatar Dec 30 '22 17:12 noahnu

:tada: This issue has been resolved in version 4.0.7 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

tophat-opensource-bot avatar Jul 20 '23 12:07 tophat-opensource-bot