rmw_connext icon indicating copy to clipboard operation
rmw_connext copied to clipboard

race condition in graph changes and service is available

Open wjwwood opened this issue 7 years ago • 4 comments

I noticed this when debugging the flaky test in rcl called test_rcl_service_server_is_available which is in the rcl/test/rcl/test_graph.cpp file:

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L477

The race seems to be between the graph guard condition being triggered (and waiting wait sets being woken up):

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L523

And the rcl_service_server_is_available function reporting that a service that was previously available is no longer available:

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L542

Normally the test only checks this when a change occurs in the graph, but this caused this test to fail with connext periodically. So I added a condition for connext where it will check on each loop regardless of whether or not a graph change was detected:

https://github.com/ros2/rcl/blob/db1353008bff40e87338c95fb46bcb4b85c970d6/rcl/test/rcl/test_graph.cpp#L525-L538

The rcl_service_server_is_available function normally reported the right state on the next loop. This special case for connext should be removed after this is fixed.

This could be caused by graph changes getting combined through some sort of coalescing of events or it could be a delay introduced by connext, I'm not sure yet. I've decided to work around and document the issue rather than solve it now.

wjwwood avatar Oct 27 '16 02:10 wjwwood