test_deeply_nested_structures Failed: DID NOT RAISE <class 'TypeError'> with python3.12
Describe the issue:
While working on dask distributed packaging in Debian sid, I encountered a test error affecting test_deeply_nested_structures, which seems to stem from serialize not behaving as expected.
Minimal Complete Verifiable Example:
$ python3.12 -m pytest distributed/protocol/tests/test_protocol.py
[...]
@pytest.mark.skipif(WINDOWS, reason="On windows this is triggering a stackoverflow")
def test_deeply_nested_structures():
# These kind of deeply nested structures are generated in our profiling code
def gen_deeply_nested(depth):
msg = {}
d = msg
while depth:
depth -= 1
d["children"] = d = {}
return msg
msg = gen_deeply_nested(sys.getrecursionlimit() - 50)
> with pytest.raises(TypeError, match="Could not serialize object"):
E Failed: DID NOT RAISE <class 'TypeError'>
/usr/lib/python3/dist-packages/distributed/protocol/tests/test_protocol.py:173: Failed
Anything else we need to know?:
The test passes with Python 3.11.9, no problem, so something has changed in 3.12. While investigating whether the present issue would be a duplicate, I found out the way serialize was supposed to have its implementation for raising TypeError already adjusted for Python 3.12 (see change in distributed/protocol/serialize.py change introduced in merge request #8223), so this looks quite recent.
Environment:
- Dask version: 2024.5.2
- Python version: 3.12.4
- Operating System: Debian sid
- Install method: source
playing around with this a bit on Python 3.13, yeah, I just can't get it to hit the codepath the test expects. Using anything from sys.getrecursionlimit() - 100 up to sys.getrecursionlimit() - 35, the serialization succeeds. Using sys.getrecursionlimit() - 34, we get a huge recursion (obviously) that starts like this:
../../BUILDROOT/usr/lib/python3.13/site-packages/distributed/protocol/serialize.py:303: in serialize
iterate_collection = check_dask_serializable(x)
../../BUILDROOT/usr/lib/python3.13/site-packages/distributed/protocol/serialize.py:215: in check_dask_serializable
return check_dask_serializable(next(iter(x.items()))[1])
../../BUILDROOT/usr/lib/python3.13/site-packages/distributed/protocol/serialize.py:215: in check_dask_serializable
return check_dask_serializable(next(iter(x.items()))[1])
and goes through thousands of calls to check_dask_serializable before eventually hitting the recursion limit:
def check_dask_serializable(x):
if type(x) in (list, set, tuple) and len(x):
return check_dask_serializable(next(iter(x)))
elif type(x) is dict and len(x):
return check_dask_serializable(next(iter(x.items()))[1])
else:
try:
> dask_serialize.dispatch(type(x))
E RecursionError: maximum recursion depth exceeded
which isn't the exception the test expects. No value seems to send the test down the path it expects to be on, where we get past that, but hit an exception other than NotImplementedError in the for name in serializers: block. I'm not sure yet what changed exactly.
It's worth noting that in this comment, @fjetter describes the test as "not super useful" and says the point of this part is really to test that if "we are hitting such an error, we're raising/ignoring it according to on_error". So we could just change this part of the test so we somehow force or mock out one of the serialization families to raise an exception, instead of trying to use an operation that we think will result in that happening 'naturally'. wdyt?
Hi Adam,
It took me some time wrapping my head around answering, because I did not really understand the purpose of the test initially (and naked truth is: I'm still not confident that I understood). Honestly, I'm wondering integration questions like: "what user expectation is broken if the serialize function does not raise TypeError when nearing RecursionError conditions?"
So, to your question:
It's worth noting that in this comment, @fjetter describes the test as "not super useful" and says the point of this part is really to test that if "we are hitting such an error, we're raising/ignoring it according to on_error". So we could just change this part of the test so we somehow force or mock out one of the serialization families to raise an exception, instead of trying to use an operation that we think will result in that happening 'naturally'. wdyt?
I concur with your idea of finding a more reliable way to trip the TypeError raise. If the new way does not stress the stack of the interpreter, it may even be possible to remove the test skip on Windows. Naive me would have suggested trying to run full steam ahead into the RecursionError, but I'm under the impression this is not an option. Maybe you have much workable ideas in mind, I let you to them. Thanks for your investigation this far!
Have a nice day, :) Étienne.