Make RosTopicSubNode a StatefulActionNode and handle different reliability QoS configurations
There were two things I wanted to improve about the RosTopicSubNode:
- The requirement that
onTickmust strictly either returnSUCCESSorFAILUREdoesn't match up well with the process of waiting for a message to be received. Practically there are actually three states: the message has not yet been received and we should continueRUNNINGwhile we wait for it, or the message has been received and the user's code determines if the contents of the message meanSUCCESSorFAILURE. - As currently implemented it can't subscribe to publishers that use the BestEffort reliability QoS config, which is often used when publishing sensor data.
Here's what I did to achieve these goals:
- Make RosTopicSubNode a StatefulActionNode. The user's implementation of
onTickcan now returnRUNNING, which allows explicitly waiting for a new message to be received. This allows the behavior tree to be simpler if the goal is to get a new message before continuing. - Always create the subscriber during the first tick. This solves two problems:
- We can get information about the publishers on the specified topic and dynamically determine the correct QoS settings to use for the subscriber. This lets us handle both Reliable and BestEffort publishers without adding more fields to
RosNodeParamsand requiring the user to set the QoS themselves. This is a partial fix for #14. - Getting the topic name only at runtime requires less handling for special situations than allowing both getting it in the constructor and later getting it at runtime.
- We can get information about the publishers on the specified topic and dynamically determine the correct QoS settings to use for the subscriber. This lets us handle both Reliable and BestEffort publishers without adding more fields to
- Add some tests to exercise these different situations.
I've kept this as a draft for now because I have a few design questions (see comments below).
Handling different QoS configurations is something we've also hit up against.
However I don't really understand the need to wait for a message. Personally I feel like the behaviour tree pattern for handling this is to add a decorator before the node retryUntilSuccessful if you need to wait for the result. If you need to distinguish between a bad result and no read, you can always add an extra port in your node that describes failure reason.
However I don't really understand the need to wait for a message.
I've found that this is a very common situation when handling robot state messages or sensor data in a behavior tree. In those cases it's important to make sure that the message was created only after entering a specific part of a tree, and it's very convenient to know that if a specific node succeeds then the rest of the tree has access to current sensor data.
To provide a specific example that I've run into recently: the driver node for the Zivid 3D camera advertises a Trigger service server that initiates a scan and publishes the captured point cloud on a separate topic instead of returning it in the service response message. If I'm able to make the subscriber BT node block until it's received the message, I can write a sequence that triggers a scan and then grabs the published point cloud with just two BT nodes. If I instead must loop over a BT node until it succeeds while also disambiguating its reason for failure each tick then I need a much more complicated tree.
Personally I feel like the behaviour tree pattern for handling this is to add a decorator before the node retryUntilSuccessful if you need to wait for the result. If you need to distinguish between a bad result and no read, you can always add an extra port in your node that describes failure reason.
I think one of the guiding themes of the BehaviorTree project is to provide C++ libraries that enable creating concise and readable behavior trees -- the Switch nodes are a great example of this. While it's certainly possible to handle this sort of thing using controls, decorators, and scripts, that approach makes the end user responsible for a lot more behavior tree complexity and duplicated C++ code than if the BT node itself provided a solution.
Yes it makes the tree a bit more verbose, but it keeps the usage pattern more consistent. There are always different opinions and usage patterns for things. Personally I am a big fan of RISC architectures, IE keep basic building blocks simple, reusable, and consistent even at the cost of verbosity further down. This can always be hidden and reused in a sub tree for these types of use cases.
I wonder if this should be a different node - a variation on topic nodes. This would be similar to how there are several variations of Sequence and Fallback with subtle but important differences in behavior.
As @tony-p said, different API users may want to handle communication failure differently, with different tradeoffs. E.g. if a user naively negates such a RosTopicSubNode to negate the condition, then communication failure turns into a success. E.g. a condition throwing exceptions could bring down the tree, when the user wants to run communication failure fallbacks instead. And message staleness is a whole other can of worms.
So, I will leave my user story here. In the stack I'm working on, the general philosophy is for the whole system to be up before a behavior tree is run. Each child class custom behavior:
- Doesn't change ROS interface names at runtime. I wonder if this is a rare edge case (but is supported because there are no const BT ports).
- Checks on construction whether its ROS interfaces are up, and throws an exception if they aren't. This exception makes our tree execution server fail the action. This catches missing parts of the stack early, instead of failing when the robot is in the middle of a task, where it may be tedious to reset everything. In this stack, the existing node works (as a parent class) and is simple to use.
- onTick, a missing message is exceptional, because the whole stack was supposed to be up. So it throws.
@schornakj Did you end up finishing this up to your satisfaction or finding a different approach? This seems like something it'd be good to have.
@gsalinas I think this https://github.com/BehaviorTree/BehaviorTree.ROS2/pull/99 solution is more flexible and I would suggest to use that one?
@tony-p Oh, thank you so much for the tip! In my case I'm specifically interested in having it be a StatefulActionNode so that I can guarantee a non-stale message, not so much in the QoS profile piece.
@schornakj Did you end up finishing this up to your satisfaction or finding a different approach? This seems like something it'd be good to have.
I made this PR and then didn't get an opportunity to test it to my satisfaction on my actual system for several months. Fortunately, I'm finally in a place where I can return to this. There are a handful of edge cases I need to correct -- in particular the original implementation doesn't correctly handle publishers with TransientLocal durability.
I wonder if this should be a different node - a variation on topic nodes. This would be similar to how there are several variations of Sequence and Fallback with subtle but important differences in behavior.
I'm also leaning towards revising this into a variation on a node instead of an in-place modification of the existing node type. That would let this avoid breaking backwards compatibility with existing users of the topic subscriber BT node.