rosbag2
rosbag2 copied to clipboard
Expose Recorder::pause/resume to Python
Description
Expose the recorder's pause and resume API to the Python API for use by tooling like rqt_bag
Original description for context
Description
Currently, there is code in rqt_bag that replicates much of the functionality provided by rosbag2_transport_py.record. However, the rqt_bag GUI allows the user to pause and resume recording. In addition, while recording, the rqt_bag UI updates the timeline to show newly recorded messages. I'd like to replace all of the recording-related code in rqt_bag by using the rosbag2_transport_py class instead. This would eliminate somewhat duplicate code and unify the recording behavior. It would also have the additional benefit of making all of the features, such as compression and QoS overrides available to rqt_bag users when recording. To make this change, I'd need the rosbag2_transport::Recoder class (and associated Python wrapper) to support pause/resume and the ability to set a callback to be invoked for each recorded message.
Related Issues
- None
Completion Criteria
rosbag2_transport::Recorder class (C++)
- pause and resume methods
- add_listener method to set a callback that is invoked upon each newly-recorded message
rosbag2_transport_py (Python)
- pause and resume methods exposed via Python interface
- add_listener methods exposed via Python interface
Associated tests
- Tests for the new functionality in rosbag2_transport::Recorder and rosbag2_transport_py
Implementation Notes / Suggestions
Currently, the Recorder class has a record_messages method that spins the node:
void Recorder::record_messages() const
{
spin(node_);
}
pause() and resume() could set a flag which is checked in record_messages. If paused, incoming messages would not be processed. Otherwise, if not paused, spin some to process incoming messages and recheck.
Alternatively, to minimize impact to the existing function, there could be an incoming flag to record_messages to indicate whether the recording can be controlled. If not, the normal spin(node_) is used. If set, an approach described above could be used.
Testing Notes / Suggestions
- Test callback mechanism - Set a callback using add_listener. Publish messages and keep track of published messages using the callback. Ensure that all published messages have been received via the callback.
- Test pause/resume - Publisher publish messages at a particular interval. Pause the recording for a specific time and then resume. Open the resulting database and ensure that the gap is present. Do the same thing, but use a callback mechanism to keep track of recorded messages and then ensure the gap is present.
There's currently some work on the way in here: https://github.com/ros2/rosbag2/pull/443 Not sure what's the current state of this is though, maybe @mabelzhang can shed some light into this.
Yeah so I no longer have allocation to work on rosbag2. @danthony06 is taking a stab at the LifeCycle part of #443, and if he needs help I think we were going to help push it along, but more likely @mjeronimo. It won't be me.
To summarize #443, it has the LifeCycle part and a keyboard control part, and they are to be in two separate PRs. The keyboard control needs to be in a Python package, probably a separate package so that we don't mix C++ and Python.
Ok, thanks Mabel. I'll take a look at #443. Perhaps we could separate some of the PRs into more basic functionality (rosbag2_transport level) and then app-level changes (like ros2bag and rqtbag).
@danthony06 is #443 something that you are actively pursuing?
Yes, I'm working on it. I've started working on the lifecycle changes, but maybe it's worth skipping that for now and handling it in a later pull request
In my understanding, LifeCycle was the main contribution of that PR. We wanted the pause feature to work by having LifeCycles. Having rosbag2 be LifeCycle was also sought after by some other features I believe. So getting LifeCycles working is the priority in that PR. Then everything else can follow - pause/resume comes with it for free, and then keyboard control later on in another PR.
So is the hope that when we pause the node, we send it to the Inactive
state? I see in the design guide that if a node is in this state, it is not supposed to functional service requests. Does that mean we will not be able pause playback, change the configuration of the node, say the playback rate, and then resume? That would seem counter-intuitive.
However it's implemented, I recommend getting in the most basic functionality first: pause/resume first in a PR, along with tests. I could also use an add_listener-type function so that I can be called upon each message (for display in my case). After this, we could follow with PRs to address ros2bag and rqt_bag. On the other hand, if this PR is closed to approval/merging, I can also add the add_listener functionality after integration of #443.
@danthony06 Hmm not sure about the specifics of how LifeCycle nodes are designed. @Karsten1987 is the expert on that.
LifeCycle is the preferred way to implement this the last I checked with @Karsten1987. The basic features are already working in #443 - pause, resume, tests. But there were some detailed review comments that needed to be addressed, like how to more properly handle timing, which @danthony06 is looking into. The ros2bag addition is really small and was required to have the pause/resume feature at all, as mentioned in one of the comments. Without the ros2bag additions to start the playback paused, there was no way to check that it actually pauses (other than doing it internally via the tests, which I think skips the LifeCycle part, don't remember), because nothing shows up in ros2 lifecycle
(that's another question I had and have no answer to). So that's why those were coupled in that PR.
I see in the design guide that if a node is in this state, it is not supposed to functional service requests
That's an implementation detail. All you've get with the lifecycle node is a set of callbacks attached to each change of state. That being said, a call to pause could in fact trigger a on_deactivate()
callback in which one would pause the playback. A call to resume triggers the on_activate()
callback where one can resume playback.
The general idea of a lifecycle node would be that you wouldn't reconfigure your system while being active/inactive. For that, one would shutdown the node first and re-configure it. But that's just a recommendation/guideline. That doesn't mean you can't react to changes in parameters (such as playback speed) within the on_activate()
call.
Yes, IMO, parameter changes can happen in whichever state is appropriate. For example, with the Navigation stack, during the on_configure callback I tended to do the heavyweight memory allocations, etc. Then, other lighter weight parameters could be applied during the on_activate callback. Finally, Active state configuration should be very infrequent and fast, such as modifying a simple variable (get in and get out) because you'd have to acquire a mutex, etc. and block the running node. This was the convention used, but in general, no particular convention is not enforced.
So, using a Lifecycle node for the implementation and mapping the recorder's pause to the node's deactivate() and resume to activate seems reasonable to me. Personally, I found Lifecycle nodes to sometimes be a bit of a pain and might've used something lighter-weight on this (as mentioned in my original comment), but that's fine as long as it works :)
I talked with some co-workers about what they would expect the behavior to be. The consensus is:
- Playback doesn't start until the node gets to the active state.
- Within the active state, playback can be paused and started.
- If the node goes to the inactive state, playback pauses.
- If the node reenters the active state, playback will remain paused until commanded to start again.
Does this seem reasonable?
On Thu, Nov 5, 2020, 3:45 PM Michael Jeronimo [email protected] wrote:
So, using a Lifecycle node for the implementation and mapping the recorder's pause to the node's deactivate() and resume to activate seems reasonable to me. Personally, I found Lifecycle nodes to sometimes be a bit of a pain and might've used something lighter-weight on this (as mentioned in my original comment), but that's fine as long as it works :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ros2/rosbag2/issues/558#issuecomment-722663534, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBZWHFE55QF2V24QDD2WWTSOMMF7ANCNFSM4TLXMOGQ .
Well, if you use this route, using a lifecycle node is really orthogonal to controlling playback (pause/resume). That is, using a lifecycle node can be done separately from the playback control. This then raises the question: what is the purpose in this situation of using lifecycle nodes? Are there some other requirements, or was the idea originally to map the pause/resume to lifecycle states?
@Karsten1987 requested the lifecycle node change. I think I agree with your concerns, that we have two control mechanisms, and the lifecycle control mechanism has side effects that impact the play/pause functionality. But if I understand the lifecycle node design, then I think the behavior will be counter-intutive to the users, because if I were to put the node into the inactive state to pause it, then it stops processing other callbacks, which means I could not do things like reconfigure the publishing rate, or other settings through service callbacks, which seems like a common use case.
Well, first a few things about lifecycle nodes that make things a bit tricky:
- Publishers created by a lifecycle node have to be explicitly activated. This typically happens in the node's on_activate callback. For example, in the AMCL node in the Naviagtion2 stack, AMCL activates its publishers like this: https://github.com/ros-planning/navigation2/blob/main/nav2_amcl/src/amcl_node.cpp#L276-L278.
- Subscribers don't have to be activated. This means that a callback can come in at any time, no matter the lifecycle state. To continue with the AMCL example, I therefore added a Boolean variable to track where the node was in the active state. When not in the Active state, it could avoid processing any incoming callbacks, like this: https://github.com/ros-planning/navigation2/blob/main/nav2_amcl/src/amcl_node.cpp#L557-L562
- Services also don't have to be activated. So, you can update a node's parameters in any state. With any incoming service request, you should also consider whether or not they should be processed depending on the node's state. Also, the lifecycle mechanisms are implemented using services, so you can't really disable services in the Inactive other other states or a lifecycle node wouldn't be able to receive any state change requests.
That being said, I think the question comes down to this: Do you want to map this node's application-level Paused and Resumed/Active states to the lifecycle node's Inactive and Active states (which I think is what Karsten was suggesting), or keep the node bring-up and application-level states separate? If you go for the latter, then the lifecycle node stuff isn't really required. You can certainly add it, but it wouldn't be addressing the core functionality you're going for.
Thanks for the informative feedback. With that information, I think it makes sense to map the pausing directly to the Active and Inactive states then.
On Tue, Nov 10, 2020, 12:00 PM Michael Jeronimo [email protected] wrote:
Well, first a few things about lifecycle nodes that make things a bit tricky:
- Publishers created by a lifecycle node have to be explicitly activated. This typically happens in the node's on_activate callback. For example, in the AMCL node in the Naviagtion2 stack, AMCL activates its publishers like this: https://github.com/ros-planning/navigation2/blob/main/nav2_amcl/src/amcl_node.cpp#L276-L278 .
- Subscribers don't have to be activated. This means that a callback can come in at any time, no matter the lifecycle state. To continue with the AMCL example, I therefore added a Boolean variable to track where the node was in the active state. When not in the Active state, it could avoid processing any incoming callbacks, like this: https://github.com/ros-planning/navigation2/blob/main/nav2_amcl/src/amcl_node.cpp#L557-L562
- Services also don't have to be activated. So, you can update a node's parameters in any state. With any incoming service request, you should also consider whether or not they should be processed depending on the node's state. Also, the lifecycle mechanisms are implemented using services, so you can't really disable services in the Inactive other other states or a lifecycle node wouldn't be able to receive any state change requests.
That being said, I think the question comes down to this: Do you want to map this node's application-level Paused and Resumed/Active states to the lifecycle node's Inactive and Active states (which I think is what Karsten was suggesting), or keep the node bring-up and application-level states separate? If you go for the latter, then the lifecycle node stuff isn't really required. You can certainly add it, but it wouldn't be addressing the core functionality you're going for.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ros2/rosbag2/issues/558#issuecomment-724868693, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBZWHHJ7IJYB7P26PDVLATSPF5SRANCNFSM4TLXMOGQ .
One big thing I am hoping to achieve with these lifecycle nodes is that I want to prepare the setup before starting the replay. We've got a bunch of very flaky system tests which fail due to a mismatch on the match between subscription and publishers. With an option of not starting the replay/recording straight up, but set up your system correctly, I am hoping to achieve a reliable connection and guarantee a constant data transfer between subscriber and publisher.
Ok, I see. Sound like a good reason to introduce the lifecycle nodes then.
The playback pause/resume feature was finally implemented without lifycycle nodes.
It looks that for the recorder it would be pretty easy to implement the feature, simply don't write to the database in the subscription callbacks when the recording is paused. There's also no "lifecycle subscription", so there wouldn't be much benefit of using that AFAIS.
I'm planning to work on this feature, I would appreciate input from maintainers before starting.
@ivanpauno Your proposal sounds good to me. :+1: And I am not a fun of the idea with lifecycle node.
This issue is overloaded. I will suggest that we split it in 2:
- Expose pause/resume recording to rosbag2_py
- Provide a callback interface for each recorded message
I reworded this issue and split the callback into #1075
@emersonknapp I'm going through my issue backlog and was taking at look at this issue. It seems since pause/resume has been implemented and #1075 has been created, this issue can be closed. Do you agree?
Yes, I agree!