ROS2 Actions and Services hanging using YasminNode
Hello! Thanks for YASMIN, it’s been a great solution for managing state machines in ROS2. We’re using it in a fairly complex domestic robot setup and our current structure has one instance of a YasminNode handling everything, which includes multiple clients (we create a new state instance for each call, which might also contribute to the problem), a lot of helper nodes (vision, manipulation, speech) and Nav2 Actions.
We’re currently experiencing an issue where services and actions intermittently fail to return a response after successfully executing. This tends to happen frequently, especially when using the Nav2 Action state in a setup similar to the demos provided in the repository. We suspect the cause might be one YasminNode handling all those actions and services since it stops happening when we call the services with other nodes.
Any ideas what is the proper way of handling this? Our next step on debugging will be making a single instance of the state so it limits that state to only one client instead of creating a bunch, but I'd like to hear your opinion. Thanks again for all your work!
Hi @gadorneles, which version of YASMIN are you using? Do you have your code in a public repo to take a look?
Hello @mgonzs13! Unfortunately we don't have a public repo but I can send you a sample of a code that was giving this error for us. We're using the version 3.3.0, but this was happening before we updated to this version as well. Here's a sample code which is shorter than we usually run but it's the same idea:
def main():
sm = StateMachine(outcomes=[SUCCEED, ABORT, CANCEL, TIMEOUT])
sm.add_state(
'SAY_SOMETHING_1',
ServiceState(srv_name="/speech/ss/say_something", srv_type=SynthesizeSpeech),
transitions={
SUCCEED: "NAVIGATE_TO_POSE_1",
ABORT: "NAVIGATE_TO_POSE_1",
CANCEL: CANCEL,
TIMEOUT: "NAVIGATE_TO_POSE_1"
})
sm.add_state("NAVIGATE_TO_POSE_1", ActionState('navigate_to_pose', NavigateToPose, feedback_cb=None), transitions={
SUCCEED: "SAY_SOMETHING_2",
ABORT: "SAY_SOMETHING_2",
CANCEL: CANCEL,
TIMEOUT: "SAY_SOMETHING_2",
})
sm.add_state(
'SAY_SOMETHING_2',
ServiceState(srv_name="/speech/ss/say_something", srv_type=SynthesizeSpeech),
transitions={
SUCCEED: "ENABLE_DETECT_1",
ABORT: "ENABLE_DETECT_1",
CANCEL: CANCEL,
TIMEOUT: "ENABLE_DETECT_1"
})
sm.add_state("ENABLE_DETECT_1", ServiceState(srv_name="/vision/fr/object_start", srv_type=Empty,timeout=5), transitions={
SUCCEED: "NAVIGATE_TO_POSE_2",
ABORT: 'NAVIGATE_TO_POSE_2',
CANCEL: CANCEL,
TIMEOUT: 'NAVIGATE_TO_POSE_2'
})
sm.add_state("NAVIGATE_TO_POSE_2", ActionState('navigate_to_pose', NavigateToPose, feedback_cb=None), transitions={
SUCCEED: SUCCEED,
ABORT: ABORT,
CANCEL: CANCEL,
TIMEOUT: TIMEOUT,
})
try:
outcome = sm(blackboard)
yasmin.YASMIN_LOG_INFO(f"State machine finished with outcome: {outcome}")
except KeyboardInterrupt:
if sm.is_running():
sm.cancel_state()
if rclpy.ok():
rclpy.shutdown()
if __name__ == "__main__":
main()
Like I mentioned before we fixed this by not using YasminNode to call these services and actions, but I'd like to know the root of the issue. We're also creating a node for each service/action at the moment which I don't think is ideal.
Some questions:
- Are you using several nodes instead of one for all the services and actions?
- How many actions/services with the same name and message type are you creating at the same time (I mean, for example, how many Nav2 NavigateToPose action clients are you creating?)
- Are you passing each node to the action and service states?
- What do you mean that the actions and services fail? Do you have the logs?
Are you using several nodes instead of one for all the services and actions?
We are right now, one node for each action and each service, which made it work. Before this we were using a single instance of the YasminNode for everything and the issue was happening.
How many actions/services with the same name and message type are you creating at the same time (I mean, for example, how many Nav2 NavigateToPose action clients are you creating?)
We were creating around 5 of each, sometimes for NavigateToPose actions it would go up to 10 or so. Though of course only one is called per time.
Are you passing each node to the action and service states?
We're doing this right now for it to work. Before we were using the default YasminNode.get_instance() function in those states.
What do you mean that the actions and services fail? Do you have the logs?
Sorry, no logs to share right now, but the actions/services are sent and executed. In the context of Navigation, the robot navigates completely fine, arrives at the goal and the terminal shows it as Goal Succeeded, except the action state never receives the feedback and is stuck forever. The same thing happens with different services.
I have been reviewing the code. I don't know if the rclpy could be the problem, maybe due to having several clients for the same action/server for the same node. I have tested creating only 2 and I worked fine. I am thinking of trying to use the same client, something similar to a singleton. Btw, which ROS 2 distro are you using?
@gadorneles Could you try the new version 3.5.0 and check if the error persists?
Hello! I am using ROS2 Humble right now. I agree that having several clients in the same node for the same action/service is most likely the issue. Unfortunately I'm very busy right now but I'll get back to you as soon as I manage to test my setup with version 3.5.0.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.