micro_ros_setup icon indicating copy to clipboard operation
micro_ros_setup copied to clipboard

ensuring agent is acting as agent for all that it should

Open cmraaron opened this issue 2 years ago • 5 comments

How to guarantee that agent is representing all publishers, nodes etc as requested?

related to the following issues and PRs:

https://github.com/micro-ROS/micro_ros_arduino/issues/40 https://github.com/micro-ROS/micro_ros_setup/issues/299 https://github.com/micro-ROS/micro_ros_setup/issues/256 https://github.com/micro-ROS/rmw_microxrcedds/pull/86/files https://github.com/micro-ROS/micro_ros_setup/issues/347 https://github.com/micro-ROS/rmw_microxrcedds/pull/143/files https://github.com/micro-ROS/micro_ros_arduino/issues/912

it appears the consensus is to use rmw_uros_ping_agent to detect the loss of agent communications, fini* and reinit all the bits, as exemplified here https://github.com/micro-ROS/micro_ros_arduino/blob/galactic/examples/micro-ros_reconnection_example/micro-ros_reconnection_example.ino

The communications with the agent is often over UDP, so its possible just to miss one of several init packets? In this case the agent would be operating normally, and perhaps we wouldnt notice our missing service for a while.

Is it possible to periodically get the XRCE protocol to re-announce to the agent the set of nodes, topics etc, so that it would self recover? Or is it feasible to implement some kind of "Tell me how you are configured, and I'll compare it to my idea of what I think you should be doing" handshaking in the protocol between agent and microros?

cmraaron avatar Aug 09 '22 16:08 cmraaron

Ping feature can be configured to make N tries for a certain amount of time to avoid a unique UDP package loss: https://github.com/micro-ROS/rmw_microxrcedds/blob/895763c817c7c07ce0be08411924d7747c2acd2d/rmw_microxrcedds_c/include/rmw_microros/ping.h#L52

XRCE offers the possibility to create the entities with a REUSE flag, in the case of micro-ROS we are using a REUSE or REPLACE (if something has been modified): https://github.com/micro-ROS/rmw_microxrcedds/blob/895763c817c7c07ce0be08411924d7747c2acd2d/rmw_microxrcedds_c/src/rmw_publisher.c#L160

Maybe we could add a RMW API with something like refresh_entities() that just tries to recreate the entities with a REUSE.

More info here: https://micro-xrce-dds.docs.eprosima.com/en/latest/client.html#creation-policy-table

Could you please tell us if this would be ok for you?

pablogs9 avatar Aug 10 '22 06:08 pablogs9

Yes! That sounds ideal. Either with REUSE, which could then indicate if something was missing, and it would be the callers responsibility to tear everything down and try again. Or REUSE | REPLACE, which would return which resources were transparently created (replaced?), if my understanding is correct.

Eg this could be necessary in the unlikely case that the agent restarted so quickly that ping didnt notice the outage.

cmraaron avatar Aug 10 '22 06:08 cmraaron

Currently, we have quite low bandwidth for doing this kind of feature. For sure it will be in our roadmap, but do not expect it soon.

If you want to contribute this with a PR in the micro-ROS RMW with the approach, we can take a look.

Thanks!

pablogs9 avatar Aug 10 '22 06:08 pablogs9

I found this post while looking into why rmw_uros_ping_agent, other init functions can sometimes fail. Currently, we retry these in a loop but the method suggested here definitely sounds like a good idea!

aditya2592 avatar Dec 12 '22 03:12 aditya2592

@pablogs9 I was going through the tutorial here which mentions that setting the client key should allow for entity reuse. Does this already cover the case discussed here?

aditya2592 avatar Feb 16 '23 09:02 aditya2592