distributed icon indicating copy to clipboard operation
distributed copied to clipboard

test_deeply_nested_structures Failed: DID NOT RAISE <class 'TypeError'> with python3.12

Open emollier opened this issue 1 year ago • 3 comments

Describe the issue:

While working on dask distributed packaging in Debian sid, I encountered a test error affecting test_deeply_nested_structures, which seems to stem from serialize not behaving as expected.

Minimal Complete Verifiable Example:

$ python3.12 -m pytest distributed/protocol/tests/test_protocol.py
[...]
    @pytest.mark.skipif(WINDOWS, reason="On windows this is triggering a stackoverflow")
    def test_deeply_nested_structures():
        # These kind of deeply nested structures are generated in our profiling code
        def gen_deeply_nested(depth):
            msg = {}
            d = msg
            while depth:
                depth -= 1
                d["children"] = d = {}
            return msg
    
        msg = gen_deeply_nested(sys.getrecursionlimit() - 50)
>       with pytest.raises(TypeError, match="Could not serialize object"):
E       Failed: DID NOT RAISE <class 'TypeError'>

/usr/lib/python3/dist-packages/distributed/protocol/tests/test_protocol.py:173: Failed

Anything else we need to know?:

The test passes with Python 3.11.9, no problem, so something has changed in 3.12. While investigating whether the present issue would be a duplicate, I found out the way serialize was supposed to have its implementation for raising TypeError already adjusted for Python 3.12 (see change in distributed/protocol/serialize.py change introduced in merge request #8223), so this looks quite recent.

Environment:

  • Dask version: 2024.5.2
  • Python version: 3.12.4
  • Operating System: Debian sid
  • Install method: source

emollier avatar Jun 18 '24 21:06 emollier

playing around with this a bit on Python 3.13, yeah, I just can't get it to hit the codepath the test expects. Using anything from sys.getrecursionlimit() - 100 up to sys.getrecursionlimit() - 35, the serialization succeeds. Using sys.getrecursionlimit() - 34, we get a huge recursion (obviously) that starts like this:

../../BUILDROOT/usr/lib/python3.13/site-packages/distributed/protocol/serialize.py:303: in serialize
    iterate_collection = check_dask_serializable(x)
../../BUILDROOT/usr/lib/python3.13/site-packages/distributed/protocol/serialize.py:215: in check_dask_serializable
    return check_dask_serializable(next(iter(x.items()))[1])
../../BUILDROOT/usr/lib/python3.13/site-packages/distributed/protocol/serialize.py:215: in check_dask_serializable
    return check_dask_serializable(next(iter(x.items()))[1])

and goes through thousands of calls to check_dask_serializable before eventually hitting the recursion limit:

    def check_dask_serializable(x):
        if type(x) in (list, set, tuple) and len(x):
            return check_dask_serializable(next(iter(x)))
        elif type(x) is dict and len(x):
            return check_dask_serializable(next(iter(x.items()))[1])
        else:
            try:
>               dask_serialize.dispatch(type(x))
E               RecursionError: maximum recursion depth exceeded

which isn't the exception the test expects. No value seems to send the test down the path it expects to be on, where we get past that, but hit an exception other than NotImplementedError in the for name in serializers: block. I'm not sure yet what changed exactly.

AdamWill avatar Jun 20 '24 22:06 AdamWill

It's worth noting that in this comment, @fjetter describes the test as "not super useful" and says the point of this part is really to test that if "we are hitting such an error, we're raising/ignoring it according to on_error". So we could just change this part of the test so we somehow force or mock out one of the serialization families to raise an exception, instead of trying to use an operation that we think will result in that happening 'naturally'. wdyt?

AdamWill avatar Jun 20 '24 22:06 AdamWill

Hi Adam,

It took me some time wrapping my head around answering, because I did not really understand the purpose of the test initially (and naked truth is: I'm still not confident that I understood). Honestly, I'm wondering integration questions like: "what user expectation is broken if the serialize function does not raise TypeError when nearing RecursionError conditions?"

So, to your question:

It's worth noting that in this comment, @fjetter describes the test as "not super useful" and says the point of this part is really to test that if "we are hitting such an error, we're raising/ignoring it according to on_error". So we could just change this part of the test so we somehow force or mock out one of the serialization families to raise an exception, instead of trying to use an operation that we think will result in that happening 'naturally'. wdyt?

I concur with your idea of finding a more reliable way to trip the TypeError raise. If the new way does not stress the stack of the interpreter, it may even be possible to remove the test skip on Windows. Naive me would have suggested trying to run full steam ahead into the RecursionError, but I'm under the impression this is not an option. Maybe you have much workable ideas in mind, I let you to them. Thanks for your investigation this far!

Have a nice day, :) Étienne.

emollier avatar Jun 27 '24 20:06 emollier