pytest `assert str1 == str2` takes forever with long strings that differ by a short prefix

Environment and versions:

platform linux -- Python 3.9.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /home/edited, configfile: pytest.ini
plugins: xdist-2.3.0, timeout-1.4.2, forked-1.3.0
timeout: 10.0s
timeout method: signal
timeout func_only: False

test-and-output.zip

To reproduce: download test.py and run pytest test.py --timeout 10 -vv

EXPECTED: both test methods (test1 and test2) fail quickly with assert False OBSERVED: test1 fails quickly and test2 timeouts after 10 seconds (or runs forever if no --timeout flag passed)

The difference between test1 and test2 is simple

test1 does x = ' long string' == 'long string'; assert x and behaves as expected, but
test2 does straight assert ' long string' == 'long string' and pytest substitutes '==' with its logic that, apparently, has the bug.

The "long string" part of the literal strings is the same four times: twice in test1 and twice in test2. The two strings that are compared with == both times differ by one leading space character only. The actual strings in test.py are a bit longer (but not crazy long, some 159000 ANSI characters/bytes), but they differ just by one leading space still.

Full output attached as output.txt, that includes the interesting stack trace which hints at where the execution is happening at the moment when the execution is cancelled by the timeout. I will paste that stack trace here for convenience:

../../.local/lib/python3.9/site-packages/_pytest/assertion/rewrite.py:485: in _call_reprcompare
    custom = util._reprcompare(ops[i], each_obj[i], each_obj[i + 1])
../../.local/lib/python3.9/site-packages/_pytest/assertion/__init__.py:141: in callbinrepr
    hook_result = ihook.pytest_assertrepr_compare(
../../.local/lib/python3.9/site-packages/pluggy/hooks.py:286: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
../../.local/lib/python3.9/site-packages/pluggy/manager.py:93: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
../../.local/lib/python3.9/site-packages/pluggy/manager.py:84: in <lambda>
    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
../../.local/lib/python3.9/site-packages/_pytest/assertion/__init__.py:179: in pytest_assertrepr_compare
    return util.assertrepr_compare(config=config, op=op, left=left, right=right)
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:153: in assertrepr_compare
    explanation = _compare_eq_any(left, right, verbose)
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:176: in _compare_eq_any
    explanation = _diff_text(left, right, verbose)
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:241: in _diff_text
    explanation += [
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:241: in <listcomp>
    explanation += [
/usr/lib/python3.9/difflib.py:912: in compare
    yield from g
/usr/lib/python3.9/difflib.py:979: in _fancy_replace
    cruncher.ratio() > best_ratio:
/usr/lib/python3.9/difflib.py:651: in ratio
    matches = sum(triple[-1] for triple in self.get_matching_blocks())
/usr/lib/python3.9/difflib.py:486: in get_matching_blocks
    i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)

Thanks!

Aug 12 '21 00:08 sevaseva

For the first one, Python has already evaluated ' long string' == 'long string' as False, and then pytest just does assert False.

For the second one, it looks like pytest evaluates assert ' long string' == 'long string' as False quickly, but now it knows more about the input and tries to display a useful diff about what is different, and its use of difflib.ndiff is slow (in https://github.com/pytest-dev/pytest/blob/main/src/_pytest/assertion/util.py):

def _diff_text(left: str, right: str, verbose: int = 0) -> List[str]:
    """Return the explanation for the diff between text.

    Unless --verbose is used this will skip leading and trailing
    characters which are identical to keep the diff minimal.
    """

    ... [snip] ...

    explanation += [
        line.strip("\n")
        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
    ]
    return explanation

As a demo, if you hack that assignment (say, change it to explanation = "bleep") it runs quickly.

Aug 13 '21 10:08 hugovk

Actually, it's not difflib that's slow, it's the comprehension to strip newlines. This runs fast:

     explanation += [
-        line.strip("\n")
-        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
+        ndiff(right.splitlines(keepends), left.splitlines(keepends))
     ]
     return explanation

Aug 13 '21 10:08 hugovk

Hmm interesting. I would never bet that line.strip("\n") would make that much of a difference. How about this?

    def strip_newline(line):
        if line.endswith("\n"):
            line = line[:-1] 
        return line

    explanation += [
        strip_newline(line)
        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
    ]
    return explanation

Hmm perhaps that is not even necessary if we use keepends=False:

    keepends = False
    explanation += ndiff(right.splitlines(keepends), left.splitlines(keepends))        
    return explanation

Aug 13 '21 10:08 nicoddemus

I'd guess the latter is a bit faster as it's doing things in C, but both of those timeout.

Aug 13 '21 11:08 hugovk

Ahh right, I see what happened, when you mentioned that this was fast:

     explanation += [
-        line.strip("\n")
-        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
+        ndiff(right.splitlines(keepends), left.splitlines(keepends))
     ]
     return explanation

It is actually incorrect, it is adding the result of ndiff as an item to the list, instead of producing all lines and extending the list, and ndiff returns a generator so nothing is actually being done, that's why it is fast.

I bet then the problem is that ndiff will take a long time to produce a diff for very large texts (as expected).

Aug 13 '21 11:08 nicoddemus

Ah yes, I should have made clear the things I posted weren't meant to be correct, just looking for the slow bits :)

Aug 13 '21 12:08 hugovk

Sure thing!

Does anyone have any ideas on how we can optimize that?

Aug 13 '21 12:08 nicoddemus

Sure thing!

Does anyone have any ideas on how we can optimize that?

I think I found a solution, it is much faster than the current code. In the current code, the test failed after 2 minutes and 12 seconds and after the fix, it failed after 26 seconds.

from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher

original-soultion

Aug 14 '21 23:08 Orenef11

Consider a fast heuristic "if at least one of the strings is longer that X: skip the whole diff calculation thing" ?

Aug 15 '21 00:08 sevaseva

I believe having ndiff skip long lines may be a major win there

Aug 15 '21 09:08 RonnyPfannschmidt

Also see #8404/#7206

Aug 15 '21 13:08 The-Compiler

I believe having ndiff skip long lines may be a major win there

Why not try to improve ndiff method? Can I open a PR that fixes it?

Aug 15 '21 15:08 Orenef11

ndiff is from the difflib standard library module, do you mean to reimplement it?

Aug 15 '21 15:08 nicoddemus

ndiff is from the difflib standard library module, do you mean to reimplement it?

Yes, I know it's a standard library function. I was thinking of trying to improve the same parts that cause the slowness. Also, another solution is to use difflib.SequenceMatcher = CSequenceMatcher, is much faster than SequenceMatcher implementation.

Aug 15 '21 15:08 Orenef11

Ahh I see. Cool, please go ahead. 👍

Aug 15 '21 15:08 nicoddemus

as far as i understand the main cost comes from https://github.com/python/cpython/blob/ad0a8a9c629a7a0fa306fbdf019be63c701a8028/Lib/difflib.py#L902

which runs a inner sequence matcher

i suspect a quick hack that opts out of long items may be a quick relief

as ndiff is a thin wrapper, it may be possible/sensible to create a small subclass that hijacks fancy replace

Aug 15 '21 17:08 RonnyPfannschmidt

@nicoddemus im wondering, if it would be enough to opt into unified diff for any large/binary sequence? - that way a little context is lost, but the largest catch goes away

Aug 15 '21 17:08 RonnyPfannschmidt

It is a possibility. However if it is something that is opt-in, the user can easily just move the comparison outside the assert:

assert s1 != s2

To:

equal = s1 == s2
assert equal, "s1 and s2 differ"

Or do you mean it will automatically skip unified diff if both strings exceed some hard-coded size?

Aug 16 '21 15:08 nicoddemus

i meant auto-skip for huge strings/lines

Aug 16 '21 16:08 RonnyPfannschmidt

I see.

How do you think we can allow that? Ideally this should be something configured per-test I guess?

Aug 16 '21 16:08 nicoddemus

@RonnyPfannschmidt @nicoddemus The solution I thought was simple to implement is not that simple.

Why not create a new view for large objects? It can only be enabled if the user has configured a flag for it.

If this is the solution, what will the output look like, and how would the difference look?

Aug 17 '21 12:08 Orenef11

Not sure myself, I feel this should be somehow configured per-test, instead of globally through a command-line flag or configuration option.

Aug 17 '21 12:08 nicoddemus

In my opinion, the best way is to create for ourselves the output we want to return to the user. Thus, we will not be dependent on packages with slow performance.

To implement it it is possible to use the deepdiff package and we can compare very big objects. WDYT?

Aug 17 '21 15:08 Orenef11

I propose that we first implement a simple heusteric for pathological cases (aka time it and pick cutoffs)

Then we can choose less detailed diff by default for those and recommend people to use specific diff assertions for their use cases

Aug 17 '21 15:08 RonnyPfannschmidt

You're right. I will open PR later today, and we can have a discussion there about a basic solution.

Aug 17 '21 17:08 Orenef11

A similar problem (it takes forever to get pytest results) happens when there are mismatches in two big enough (len >= 5000) lists of strings or other objects while running pytest either with -v option or when it detects that it is running on CI [1].

Feb 09 '22 01:02 iaroslav-naidon

I've stumbled upon this problem in a test that compare PNG image with a fixture. I've solved it by comparing image md5 and ASCII representation instead of raw data. It's even better because from ASCII diff you could actually get an idea what differs. May be it will be useful for someone.

from PIL import Image
import ascii_magick
from hashlib import md5

def test_image():
    image = Image.open(BytesIO(image_data))
    ascii_image = ascii_magic.from_image(image, width_ratio=1, mode=ascii_magic.Modes.ASCII)
    assert dict(md5=md5(image_data).digest(), image=ascii_image) == fixture_image_data

Nov 22 '22 14:11 nikicat

Maybe it can just fallback to simple "assertion failed, too much data" whenever diff fails to complete within, say, 15 seconds?

I've just got same issue on intermittently failing test inside GitHub runner and... I lost 4 hours looking for deadlock, until reproduced it locally, dumped call stack and realised it hangs inside assertion.

(simplified version)

from dataclasses import dataclass
from random import Random

__RNG = Random(42)


@dataclass(order=False, frozen=True, match_args=False, kw_only=True, slots=True)
class _DataclassModel:
    ids: list[str]
    labels: list[str]
    concept_ids_1: list[tuple[str, ...]]
    concept_ids_2: list[tuple[str, ...]]
    concept_ids_3: list[tuple[str, ...]]
    vectors: list[list[int]]
    tensors: list[list[int]]


def test_pytest():
    randoms = [__RNG.randint(0, 100) for _ in range(1000)]

    data_1 = _DataclassModel(
        ids=[f"https://data.test.com/fake/{randoms[x]}" for x in range(1000)],
        labels=[f"label_{randoms[x]}" for x in range(1000)],
        concept_ids_1=[tuple(str(randoms[x]) for _ in range(10)) for x in range(1000)],
        concept_ids_2=[tuple(str(randoms[x]) for _ in range(10)) for x in range(1000)],
        concept_ids_3=[tuple(str(randoms[x]) for _ in range(10)) for x in range(1000)],
        vectors=[[randoms[x] for _ in range(416)] for x in range(1000)],
        tensors=[[randoms[x] for _ in range(416)] for x in range(1000)],
    )

    randoms = [__RNG.randint(0, 100) for _ in range(1000)]

    data_2 = _DataclassModel(
        ids=[f"https://data.test.com/fake/{randoms[x]}" for x in range(1000)],
        labels=[f"label_{randoms[x]}" for x in range(1000)],
        concept_ids_1=[tuple(str(randoms[x]) for _ in range(10)) for x in range(1000)],
        concept_ids_2=[tuple(str(randoms[x]) for _ in range(10)) for x in range(1000)],
        concept_ids_3=[tuple(str(randoms[x]) for _ in range(10)) for x in range(1000)],
        vectors=[[randoms[x] for _ in range(416)] for x in range(1000)],
        tensors=[[randoms[x] for _ in range(416)] for x in range(1000)],
    )

    assert data_1 == data_2

May 23 '25 01:05 vladyslav-burylov

That's the idea

Its just never been implemented

May 23 '25 03:05 RonnyPfannschmidt

I am having this issue as well asserting two large strings. It fails after a while with a RecursionError. I would love this to be prioritized.

Aug 24 '25 09:08 martinmoldrup

pytest pytest copied to clipboard

`assert str1 == str2` takes forever with long strings that differ by a short prefix

pytest
pytest copied to clipboard