rcl_interfaces
rcl_interfaces copied to clipboard
Primary state error transitions
In the node_lifecycle design document, it shows the state machine for lifecycle nodes:
The red X for Error Raised on the active state indicates that the original author intended a transition from active to errorProcessing. This PR and the related ones to follow in rcl and rclcpp implement that functionality, while the related PR in demos shows its use.
I also submit that there should be a similar transition from inactive to errorProcessing.
My use case:
I have a lifecycle node that communicates with a piece of hardware that needs to be configured before use and thus a lifecycle node was an obvious choice. For safety reasons, however, the hardware can be hard-killed and reset, while the machine on which the ros node runs stays on. The node can recognize that the hardware has been power-reset. If I am in the active state, I need to go directly to the unconfigured state. Without the error transition, I need a bunch of special logic in the on_deactivate and on_cleanup callbacks, because if I self-transition in the node with deactivate without extra logic, I will be calling functions on the unconfigured hardware. Anyway, I think the error transition more correctly describes what happens. I also prefer to keep the externally available transitions external to the node.
Similarly, if my hardware is killed while the node is inactive, I will not want to call the cleanup code that could try to set parameters on the unconfigured hardware, I want to go through the error transition instead.
My original question asked to the community on the answers page is here. When I asked it, I did not find that this functionality was also asked for on the answers page by another user and that issue 547 on rclcpp had been filed until more searching.
Why is this dco bot whining at me when I have a signed-off-by line?
@thebyohazard
thanks for the contribution,
The red X for Error Raised on the active state indicates that the original author intended a transition from active to errorProcessing.
i think so too, in fact the following is described in lifecycle design,
Transition State: ErrorProcessing
It is possible to enter this state from any state where user code will be executed.
The node can recognize that the hardware has been power-reset.
IMO, i would not do with error processing since device state can be detected.
anyway, i really would like to hear from others.
I also submit that there should be a similar transition from inactive to errorProcessing.
Yes, I think that was an oversight in the graphic as there definitely could be user code there that could error as pointed out by @fujitatomoya
Originally there wasn't expected to be anything happening in the inactive state or configured states, but there may be other threads or activity that could cause transitions (such as hardware errors) that would change the status. So to that end adding a transition from Unconfigured to ErrorProcessing would also make sense since something could go wrong in that state too, potentially recoverable with some error recovery.
A proposed PR to the design to fill that in for discussion would be a good idea. Once resolved there we can update this to match the resolution and fill in the implementation here and the others linked.
Is there any movement on this and the accompanying PRs?
@SteveMacenski: Yes! I'm in the process of doing the design PR that Tully suggested and I hope to have that done today. Coronavirus has been meddling in my plans lately.
Similarly, if my hardware is killed while the node is inactive, I will not want to call the cleanup code that could try to set parameters on the unconfigured hardware, I want to go through the error transition instead.
@thebyohazard I think this is very application dependant. You might find other cases where the on_cleanup() is necessary before trying to configure again. Take the example of a hypothetical robot with enable_drives and disable_drives() interfaces called in the respective transitions. Some hardware might need the connection to reset to be able to configure again.
@thebyohazard
friendly ping, are you still working on this? i think this makes sense but it would be better to discuss and have consensus on design 1st.
Any progress on this? IMO some sort of fix like this is strongly needed for lifecycle nodes to function as intended. As it stands adding primary state exception handling from an extending class is very roundabout.
I was going to take over, see https://github.com/ros2/design/pull/283. but i do not have time to do it soon.
@fujitatomoya I really need this feature and I will be happy to help. Could you please summarize what's missing before this can be merged? /cc @tfoote @clalancette @Karsten1987
current status is that some PRs are under Requested Change and 2nd review from maintainer is required, the original author is not available at this moment.
@fujitatomoya So there's no way the community can help? I mean, it's been 2 years and this is a critical part of the design that is missing implementation...
I think i was going to take over and borrow the code from @thebyohazard to make PR since author is not respondng, but i do not have time to do that right now. it would be always and really nice to have the help from community!
it would be always and really nice to have the help from community!
Glad to hear it. So, could you please summarize what's missing so I can give it a shot?