rclpy
rclpy copied to clipboard
SystemError `PyCFunction with class but no METH_METHOD` during normal ROS operations
Bug report
Required Info:
- Operating System:
- Ubuntu 22.04 running Docker (also 22.04)
- Installation type:
- Docker, ROS installed from apt. Similar env to https://github.com/UrbanMachine/create-ros-app
- DDS implementation:
- cyclonedds
- Client library (if applicable):
- rclpy
Steps to reproduce issue
WIP. Need some help isolating why this might be happening so I can create reproducible example. This issue is to get the conversation started, and see if others have seen something similar in the wild.
Expected behavior
No exception
Actual behavior
Occasional failures. Here are 2 examples of Tracebacks we've had happen on our robots, in the wild:
Example 1
Traceback (most recent call last):
File "/robot/install/hardware/lib/hardware/big_bird_4_0_capcom", line 8, in <module>
sys.exit(main())
File "/robot/install/node_helpers/lib/python3.10/site-packages/node_helpers/spinning/initialization.py", line 68, in spin_fn
rclpy.spin(node, executor)
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/__init__.py", line 226, in spin
executor.spin_once()
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 794, in spin_once
self._spin_once_impl(timeout_sec)
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 775, in _spin_once_impl
handler, entity, node = self.wait_for_ready_callbacks(
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 711, in wait_for_ready_callbacks
return next(self._cb_iter)
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 517, in _wait_for_ready_callbacks
waitables.extend(filter(self.can_execute, node.waitables))
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 457, in can_execute
return not entity._executor_event and entity.callback_group.can_execute(entity)
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/callback_groups.py", line 102, in can_execute
with self._lock:
SystemError: attempting to create PyCFunction with class but no METH_METHOD flag
Example 2:
In this case, it happened when calling is_cancel_requested property on a ServerGoalHandle
Traceback (most recent call last):
File "/robot/install/node_helpers/lib/python3.10/site-packages/node_helpers/robust_rpc/_wrappers.py", line 72, in wrapper
return callback(goal)
File "/robot/install/node_helpers/lib/python3.10/site-packages/node_helpers/actions/server/base_handler.py", line 91, in _on_goal_with_report
result = self.on_goal(goal_handle)
File "/robot/install/node_helpers/lib/python3.10/site-packages/node_helpers/actions/server/fail_fast_handler.py", line 54, in on_goal
return super().on_goal(goal_handle)
File "/robot/install/node_helpers/lib/python3.10/site-packages/node_helpers/actions/server/base_handler.py", line 71, in on_goal
return worker.execute_callback(self._request_timeout)
File "/robot/install/node_helpers/lib/python3.10/site-packages/node_helpers/actions/server/worker.py", line 82, in execute_callback
while timeout_obj and not self.goal_handle.is_cancel_requested:
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py", line 98, in is_cancel_requested
return GoalStatus.STATUS_CANCELING == self.status
File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py", line 102, in status
with self._lock:
SystemError: attempting to create PyCFunction with class but no METH_METHOD flag
Additional information
This happens a few times a day, on a single machine running ros2 and ~25-50 nodes.
This one is going to be difficult to diagnose without more information or a reproduction.
I think in the near term, there are two options we can pursue.
-
Try to build humble with address sanitizer turned on. This can have performance impact, so if you are using it in a production system, this may not make sense.
-
Enable collection coredumps, so that we can at least get a backtrace of what was going on around the point that the systemerror occurs. For a simple shell, you can enable coredumps with
ulimit -c unlimited, but you may need a more detailed setup if you have limited permissions on your system.
Adding a bit of information here- I noticed this issue happens most frequently if you're calling is_cancel_requested on the ServerGoalHandle at a fast clip. I added a sleep(0.001) in a fast loop and noticed the problem stopped happening.
I'll keep adding details here as I get more signal on the root cause.