//tutorials:py/hydroelastic_contact_basics_test failure in CI build
What happened?
//tutorials:py/hydroelastic_contact_basics_test failed on a PR build, but succeeded in two rebuilds.
https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-everything-release/5204/consoleFull
This is a first occurrence so I am closing this issue for now.
Version
No response
What operating system are you using?
Ubuntu 22.04
What installation option are you using?
No response
Relevant log output
[12:03:45 PM] ==================== Test output for //tutorials:py/hydroelastic_contact_basics_test:
[12:03:45 PM] [IPKernelApp] WARNING | debugpy_stream undefined, debugging will not be enabled
[12:03:45 PM] Running notebook as a test (non-interactive)
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 626, in _async_poll_for_reply
[12:03:45 PM] msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 96, in ensure_async
[12:03:45 PM] result = await obj
[12:03:45 PM] File "/usr/lib/python3/dist-packages/jupyter_client/channels.py", line 224, in get_msg
[12:03:45 PM] ready = await self.socket.poll(timeout)
[12:03:45 PM] asyncio.exceptions.CancelledError
[12:03:45 PM]
[12:03:45 PM] During handling of the above exception, another exception occurred:
[12:03:45 PM]
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 838, in async_execute_cell
[12:03:45 PM] exec_reply = await self.task_poll_for_reply
[12:03:45 PM] asyncio.exceptions.CancelledError
[12:03:45 PM]
[12:03:45 PM] During handling of the above exception, another exception occurred:
[12:03:45 PM]
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24199/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tutorials/hydroelastic_contact_basics_jupyter_py_main.py", line 11, in <module>
[12:03:45 PM] main()
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24199/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tutorials/hydroelastic_contact_basics_jupyter_py_main.py", line 7, in main
[12:03:45 PM] _jupyter_bazel_notebook_main("drake/tutorials/hydroelastic_contact_basics.ipynb", sys.argv[1:])
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24199/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tools/jupyter/jupyter_bazel.py", line 80, in _jupyter_bazel_notebook_main
[12:03:45 PM] ep.preprocess(nb, resources={'metadata': {'path': notebook_dir}})
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 84, in preprocess
[12:03:45 PM] self.preprocess_cell(cell, resources, index)
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24199/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tools/jupyter/jupyter_bazel.py", line 34, in preprocess_cell
[12:03:45 PM] return super().preprocess_cell(*args, **kwargs)
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 105, in preprocess_cell
[12:03:45 PM] cell = self.execute_cell(cell, index, store_history=True)
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 84, in wrapped
[12:03:45 PM] return just_run(coro(*args, **kwargs))
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 62, in just_run
[12:03:45 PM] return loop.run_until_complete(coro)
[12:03:45 PM] File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[12:03:45 PM] return future.result()
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 842, in async_execute_cell
[12:03:45 PM] raise DeadKernelError("Kernel died")
[12:03:45 PM] nbclient.exceptions.DeadKernelError: Kernel died
[12:03:45 PM] ================================================================================
[12:03:45 PM] ==================== Test output for //tutorials:py/hydroelastic_contact_basics_test:
[12:03:45 PM] [IPKernelApp] WARNING | debugpy_stream undefined, debugging will not be enabled
[12:03:45 PM] Running notebook as a test (non-interactive)
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 626, in _async_poll_for_reply
[12:03:45 PM] msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 96, in ensure_async
[12:03:45 PM] result = await obj
[12:03:45 PM] File "/usr/lib/python3/dist-packages/jupyter_client/channels.py", line 224, in get_msg
[12:03:45 PM] ready = await self.socket.poll(timeout)
[12:03:45 PM] asyncio.exceptions.CancelledError
[12:03:45 PM]
[12:03:45 PM] During handling of the above exception, another exception occurred:
[12:03:45 PM]
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 838, in async_execute_cell
[12:03:45 PM] exec_reply = await self.task_poll_for_reply
[12:03:45 PM] asyncio.exceptions.CancelledError
[12:03:45 PM]
[12:03:45 PM] During handling of the above exception, another exception occurred:
[12:03:45 PM]
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24235/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tutorials/hydroelastic_contact_basics_jupyter_py_main.py", line 11, in <module>
[12:03:45 PM] main()
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24235/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tutorials/hydroelastic_contact_basics_jupyter_py_main.py", line 7, in main
[12:03:45 PM] _jupyter_bazel_notebook_main("drake/tutorials/hydroelastic_contact_basics.ipynb", sys.argv[1:])
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24235/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tools/jupyter/jupyter_bazel.py", line 80, in _jupyter_bazel_notebook_main
[12:03:45 PM] ep.preprocess(nb, resources={'metadata': {'path': notebook_dir}})
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 84, in preprocess
[12:03:45 PM] self.preprocess_cell(cell, resources, index)
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24235/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tools/jupyter/jupyter_bazel.py", line 34, in preprocess_cell
[12:03:45 PM] return super().preprocess_cell(*args, **kwargs)
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 105, in preprocess_cell
[12:03:45 PM] cell = self.execute_cell(cell, index, store_history=True)
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 84, in wrapped
[12:03:45 PM] return just_run(coro(*args, **kwargs))
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 62, in just_run
[12:03:45 PM] return loop.run_until_complete(coro)
[12:03:45 PM] File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[12:03:45 PM] return future.result()
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 842, in async_execute_cell
[12:03:45 PM] raise DeadKernelError("Kernel died")
[12:03:45 PM] nbclient.exceptions.DeadKernelError: Kernel died
[12:03:45 PM] ================================================================================
[12:03:45 PM] ==================== Test output for //tutorials:py/hydroelastic_contact_basics_test:
[12:03:45 PM] [IPKernelApp] WARNING | debugpy_stream undefined, debugging will not be enabled
[12:03:45 PM] Running notebook as a test (non-interactive)
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 626, in _async_poll_for_reply
[12:03:45 PM] msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 96, in ensure_async
[12:03:45 PM] result = await obj
[12:03:45 PM] File "/usr/lib/python3/dist-packages/jupyter_client/channels.py", line 224, in get_msg
[12:03:45 PM] ready = await self.socket.poll(timeout)
[12:03:45 PM] asyncio.exceptions.CancelledError
[12:03:45 PM]
[12:03:45 PM] During handling of the above exception, another exception occurred:
[12:03:45 PM]
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 838, in async_execute_cell
[12:03:45 PM] exec_reply = await self.task_poll_for_reply
[12:03:45 PM] asyncio.exceptions.CancelledError
[12:03:45 PM]
[12:03:45 PM] During handling of the above exception, another exception occurred:
[12:03:45 PM]
[12:03:45 PM] Traceback (most recent call last):
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24256/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tutorials/hydroelastic_contact_basics_jupyter_py_main.py", line 11, in <module>
[12:03:45 PM] main()
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24256/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tutorials/hydroelastic_contact_basics_jupyter_py_main.py", line 7, in main
[12:03:45 PM] _jupyter_bazel_notebook_main("drake/tutorials/hydroelastic_contact_basics.ipynb", sys.argv[1:])
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24256/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tools/jupyter/jupyter_bazel.py", line 80, in _jupyter_bazel_notebook_main
[12:03:45 PM] ep.preprocess(nb, resources={'metadata': {'path': notebook_dir}})
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 84, in preprocess
[12:03:45 PM] self.preprocess_cell(cell, resources, index)
[12:03:45 PM] File "/media/ephemeral0/ubuntu/workspace/linux-jammy-gcc-bazel-experimental-everything-release/_bazel_ubuntu/d5e9f70b55234713aa722cc9d6b555d5/sandbox/linux-sandbox/24256/execroot/_main/bazel-out/k8-opt/bin/tutorials/py/hydroelastic_contact_basics_test.runfiles/_main/tools/jupyter/jupyter_bazel.py", line 34, in preprocess_cell
[12:03:45 PM] return super().preprocess_cell(*args, **kwargs)
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 105, in preprocess_cell
[12:03:45 PM] cell = self.execute_cell(cell, index, store_history=True)
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 84, in wrapped
[12:03:45 PM] return just_run(coro(*args, **kwargs))
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/util.py", line 62, in just_run
[12:03:45 PM] return loop.run_until_complete(coro)
[12:03:45 PM] File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
[12:03:45 PM] return future.result()
[12:03:45 PM] File "/usr/lib/python3/dist-packages/nbclient/client.py", line 842, in async_execute_cell
[12:03:45 PM] raise DeadKernelError("Kernel died")
[12:03:45 PM] nbclient.exceptions.DeadKernelError: Kernel died
[12:03:45 PM] ================================================================================
3/27 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-continuous-debug/2622/console
3/28 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-everything-release/5866/consoleFull
3/28 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-everything-release/5869/consoleFull (twice in a row, for that PR)
3/28 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-debug/12161/consoleFull 3/30 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-debug/12163/consoleFull
Something has caused this test to become a flaky in the past few days.
@DamrongGuoy please see if you can figure out the problem, e.g., by reproducing the failure locally.
3/28: https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-debug/12154/consoleFull 3/28: https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-everything-release/5875/consoleFull 3/28: https://drake-jenkins.csail.mit.edu/job/linux-jammy-clang-bazel-experimental-everything-release/7835/consoleFull
3/29: https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-continuous-debug/2627/ 3/29: https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-nightly-everything-debug/403/ (Timeout) 3/30: https://drake-jenkins.csail.mit.edu/job/linux-jammy-clang-bazel-nightly-everything-debug/406/
3/31: https://drake-jenkins.csail.mit.edu/job/linux-jammy-clang-bazel-continuous-everything-release/1558 3/31: https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-continuous-everything-release/1136
3/31 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-continuous-debug/2629/ 3/31 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-continuous-debug/2630/
Looking at CDash history, the earliest instance of this error message I can find was on 3/25 at https://drake-cdash.csail.mit.edu/tests/1763068472 on behalf of the PR #22778 build of https://github.com/RobotLocomotion/drake/commit/934be006075f90c31fe18e45e684e9ad06d80d30. See https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-experimental-debug/12102/consoleFull. That puts master at #22806 as the baseline.
Actually on 3/20 we hit it https://drake-cdash.csail.mit.edu/tests/1758481279 on https://github.com/RobotLocomotion/drake/pull/22785, with a master baseline of #22772.
William shared a helpful CDash summary:
https://drake-cdash.csail.mit.edu/queryTests.php?project=Drake&begin=2024-01-01&end=2025-03-31&filtercount=2&showfilters=1&filtercombine=and&field1=testname&compare1=61&value1=%2F%2Ftutorials%3Apy%2Fhydroelastic_contact_basics_test&field2=status&compare2=61&value2=Failed
Filtered with test output:
https://drake-cdash.csail.mit.edu/queryTests.php?project=Drake&begin=2024-01-01&end=2025-03-31&filtercount=3&showfilters=1&filtercombine=and&field1=testname&compare1=61&value1=%2F%2Ftutorials%3Apy%2Fhydroelastic_contact_basics_test&field2=status&compare2=61&value2=Failed&field3=testoutput&compare3=95&value3=nbclient.exceptions.DeadKernelError%3A%20Kernel%20died
1/24 - 1/30 there were quite a few cases 2/4 - one failure (this issue was opened) 3/18 - now, lots of failures again
I have tried a few things to repro locally, to no avail. Possibly someone else will be more lucky.
Note that the CI failures take only about 5 seconds to happen, whereas a passing test takes around 20-25 seconds in release builds and much longer in debug builds, so that indicates the crash is happening pretty early during startup, possibly even during noteobok boot-up before import pydrake, or during import pydrake itself.
Besides trying to repro, possible next steps are:
- Disable the test on master (to prevent it interfering with unrelated PRs).
- Try to add more information to the error / error handling to get a better handle on exactly what is happening.
@jwnimmer-tri is this related to what you said "happening pretty early during startup, possibly even during noteobok boot-up before import pydrake, or during import pydrake itself" or not?
/linux-jammy-gcc-bazel-experimental-everything-release/src/bindings/pydrake/BUILD.bazel:453:22: GenerateMypyStubs bindings/pydrake/pydrake/autodiffutils.pyi failed: (Exit 1): stubgen failed: error executing GenerateMypyStubs command
I saw it in https://drake-cdash.csail.mit.edu/builds/1809879
That's not related.
3/31 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-continuous-debug/2631/ 3/31 https://drake-jenkins.csail.mit.edu/job/linux-jammy-gcc-bazel-continuous-debug/2632/
I'll take this over @DamrongGuoy. At least to the point of getting a reproducible result / experiment.
Thank you for not letting me go into the wrong log file.
After reading the right log file, I still can't reproduce it. Please let me know when we know how to do it. I'll also try a few more things.
Seems like //tutorials:py/hydroelastic_contact_nonconvex_mesh_test failed with a similar error (8/14): https://drake-jenkins.csail.mit.edu/view/Production/job/linux-noble-unprovisioned-gcc-bazel-nightly-release/59/
8/21 //tutorials:py/hydroelastic_contact_nonconvex_mesh_test failed again in https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/linux-noble-clang-bazel-nightly-everything-debug/65/
Next time it happens on //tutorials:py/hydroelastic_contact_nonconvex_mesh_test, please open a new issue for that notebook. It's different than the notebook blamed in this issue.
The hydroelastic_contact_basics_test has not failed recently, so closing this as "not planned".
The hydroelastic_contact_basics_test has not failed recently, so closing this as "not planned".
Oops. It hasn't failed because it's disabled.
=> #23629
Will monitor for flakiness issues after re-enabling.