rclpy BufferClient lookup_transform not working, timeout
Bug report
Required Info:
- Operating System:
- Docker + Ubuntu 20.04
- Installation type:
- From binaries
- Version or commit hash:
- Foxy
- DDS implementation:
- Fast-RTPS,
- Client library (if applicable):
- rclpy
Steps to reproduce issue
Calling BufferClient.lookup_transform always timeouts although if I call the buffer server via ros2 cli it works.
Create a node like the following:
#!/usr/bin/env python3
import rclpy
from rclpy.node import Node
from tf2_ros.buffer_client import BufferClient
from rclpy.duration import Duration
from std_msgs.msg import Empty
class TestNode(Node):
def __init__(
self,
):
super().__init__("test_node")
self.cb_group = None
self.buffer = BufferClient(
self,
"/tf2_buffer_server",
timeout_padding=1.0,
callback_group=self.cb_group,
)
self.sub = self.create_subscription(
msg_type=Empty,
topic="/test_buffer_client",
callback=self.buffer_cb,
qos_profile=1,
)
def buffer_cb(self, msg):
_ = self.buffer.lookup_transform(
"map",
"odom",
rclpy.time.Time(seconds=0, nanoseconds=0),
timeout=Duration(seconds=1.0),
)
self.get_logger().info("Success")
def main(args=None):
rclpy.init(args=args)
node = TestNode()
executor = None
rclpy.spin(node, executor)
if __name__ == "__main__":
main()
Make sure the buffer server is already running and there is a valid transform between map->odom. Then publish to the topic that calls BufferClient.lookup_transform.
ros2 topic pub -1 /test_buffer_client std_msgs/msg/Empty
The traceback is:
Traceback (most recent call last):
File "/workspace/rover/ros2/install/navigation/lib/navigation/navigation_test", line 33, in <module>
sys.exit(load_entry_point('navigation', 'console_scripts', 'navigation_test')())
File "/workspace/rover/ros2/build/navigation/navigation/node_test.py", line 53, in main
rclpy.spin(node, executor)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/__init__.py", line 191, in spin
executor.spin_once()
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 711, in spin_once
raise handler.exception()
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/task.py", line 239, in __call__
self._handler.send(None)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 426, in handler
await call_coroutine(entity, arg)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 351, in _execute_subscription
await await_or_execute(sub.callback, msg)
File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 118, in await_or_execute
return callback(*args)
File "/workspace/rover/ros2/build/navigation/navigation/node_test.py", line 33, in buffer_cb
_ = self.buffer.lookup_transform(
File "/opt/ros/foxy/lib/python3.8/site-packages/tf2_ros/buffer_client.py", line 87, in lookup_transform
return self.__process_goal(goal)
File "/opt/ros/foxy/lib/python3.8/site-packages/tf2_ros/buffer_client.py", line 201, in __process_goal
raise tf2.TimeoutException("The LookupTransform goal sent to the BufferServer did not come back in the specified time. Something is likely wrong with the server")
tf2.TimeoutException: The LookupTransform goal sent to the BufferServer did not come back in the specified time. Something is likely wrong with the server
Expected behavior
It shouldn't crash since it works with the ros2 cli. If you use the following it works:
ros2 action send_goal /tf2_buffer_server tf2_msgs/action/LookupTransform '{target_frame: map, source_frame: odom}'
Actual behavior
It timeouts every time.
Additional notes
I modified https://github.com/ros2/geometry2/blob/ros2/tf2_ros_py/tf2_ros/buffer_client.py#L61 to accept also a callback group that is passed to the ActionClient. Then, in the TestNode example above I set the self.cb_group=ReentrantCallbackGroup() and the executor to be a MultiThreadedExecutor. Only that way it works.
I am missing something?
I am also having the same issue in rclpp, when using this inside a callback the BufferClient.lookupTransform timeouts.
@clalancette I see there are some tests for this but not for testing it inside a callback function.
I have the same problem - I think this feature is actually not ready to be used?
I am also having the same issue in
rclpp, when using this inside a callback theBufferClient.lookupTransformtimeouts. @clalancette I see there are some tests for this but not for testing it inside a callback function.
For better or worse, that is by design. The problem with doing this is that you are blocking the executor, but the executor needs to run in order to get the transform.
The solution here is to make sure to never run blocking functions like lookupTransform from within a callback, or you use the MultiThreadedExecutor, or you instantiate the TransformListener with spin_thread = True.
I am also having the same issue in
rclpp, when using this inside a callback theBufferClient.lookupTransformtimeouts. @clalancette I see there are some tests for this but not for testing it inside a callback function.For better or worse, that is by design. The problem with doing this is that you are blocking the executor, but the executor needs to run in order to get the transform.
The solution here is to make sure to never run blocking functions like
lookupTransformfrom within a callback, or you use the MultiThreadedExecutor, or you instantiate theTransformListenerwithspin_thread = True.
BufferClient doesn't have any TransformListener. Also using MultiThreadedExecutor does not work, you have to pass a ReentrantCallbackGroup to the client inside the BufferClient class -which currently is not exposed- as already mentioned in the issue.
This could be fixed by spinning a separate callback group in a separate thread. Independent execution of callback groups is currently only supported in rclcpp, but this pr aimed at implementing them https://github.com/ros2/rclpy/pull/909. TF2 also features a not so clean fix. You can pass the TransformListener a newly created node object. This new node is used to gather the transforms in parallel if you also set spin_thread to True.
I think there is a misunderstanding here. The issue described in this thread refers to the BufferClient class which has nothing to do with a TransformListener. The BufferClient is just wrapping a rclpy.action.client.ActionClient to call a tf2_ros buffer_server using that client.
Oh nvm. I missread this.
I do have the same problem right now. Do I really need to change the BufferClient Interface or is there an other way of handling this issue?
Hi all, thanks very much for your input. I ran into this issue and I've found a solution that doesn't require changing the BufferClient interface, though I do think this should be changed. @clalancette is there any progress on this? A quick and dirty way is to change the Node's default callback group to a ReentrantCallbackGroup() then using a multithreaded excecutor to run the node. This way when you create the BufferClient it will use a ReentrantCallbackGroup by default. Some example code below:
class ExampleActionServer(Node):
def __init__(self):
super().__init__('example_action_server')
# Set the nodes default callback group to ReentrantCallbackGroup()
self._default_callback_group = ReentrantCallbackGroup()
self.tf_buffer = BufferClient(self, "/tf_action")
self.tf_buffer.action_client.wait_for_server(5.0)
self._action_server = ActionServer(
self,
ExampleAction,
'example_action',
self.execute_callback)
def main():
rclpy.init()
excecutor = rclpy.executors.MultiThreadedExecutor()
example_action_server = ExampleActionServer()
excecutor.add_node(example_action_server)
excecutor.spin()
I've found that the buffer client does work from a callback this way.
This blog is quite relevant for this issue to better understand it; https://karelics.fi/deadlocks-in-rclpy/
I would propose to the maintainers to put the service client of the BufferClient in a separate callback group, like is done for the subscribers of the transform listener
For people running into this problem:
Setting spin_thread to true did not fix my problem of doing transform lookups within a callback. Adding a rclpy.executors.MultiThreadedExecutor did.
Also, for people that have timeout/blocking issues in ROS2 in general, the multithreaded executor helps. But keep in mind that the default callback group is mutually exclusive and you need to select a reentrant callback group (or multiple mutually exclusive ones) for the involved callbacks to use multiple threads in the multithreaded executor.