rosbag2
rosbag2 copied to clipboard
Rosbag2 silently stops recording image and pointcloud2 topics
Description
rosbag2 silently stops recording image and pointcloud2 topics shortly after starting to record. After a few seconds (10~60) of recording the image and pointcloud topics stop recording, while all of the 'smaller' topics appear to continue recording.
Expected Behavior
I expect to be able to record data until my harddrive is full. :)
Actual Behavior
Record lidar and camera topics, no error messages will be displayed but those topics will stop recording.
System
- OS: U20.04
- ROS 2 Distro: Foxy
- Version: master (f1d1145de6762)
Additional context
I have eliminated sqlite3 from the root cause in this particular case by making a small modification to my_test_plugin
(which could probably use a better name, but that's for another day... )
src/ros2/rosbag2/rosbag2_storage/test/rosbag2_storage/test_plugin.hpp:
int skipcnt_ = 0;
std::map<std::string, int> msgCount_;
modules/common/src/ros2/rosbag2/rosbag2_storage/test/rosbag2_storage/test_plugin.cpp:
void TestPlugin::write(const std::shared_ptr<const rosbag2_storage::SerializedBagMessage> msg) {
(void)msg;
++msgCount_[msg->topic_name];
if (!(++skipcnt_ % 100)) {
std::cout << std::endl << "===================" << std::endl;
for (auto x : msgCount_) {
std::cout << std::left << std::setw(40) << x.first << std::right << std::setw(6)
<< x.second << std::endl;
}
}
}
This snippet simply counts every message received by the storage plugin and displays the count periodically. What I see with this code is that all topics are received at the start of recording, but after a random amount of time the lidar and image topics stop. rviz continues to receive sensor data, so I'm pretty sure that the publishers are functional.
While writing this up, I just noticed that all of my image and lidar topics started recording again after about 10~15 minutes. They stopped recording again after about a minute or so.
When the lidar and camera topics stop recording, they always start and stop at the same time. The 'smaller' topics never stop.
Not super sure where to look next, so any hints would be appreciated.
Fixing the memory leak seems to have resolved part of the problem, but not entirely. The 'my_test_plugin' ran for ~9 hours without losing connection to any of the topics, which is a first. The sqlite3 plugin is still still dropping the 'large' topics after 30~60 seconds of recording.
I move my trace output to record.cpp so that it works with both plugins. I have found it quite insightful and might be worth adding as a --flag to rosbag record, with some modifications to make it print periodically based on time rather than message count. If there's support, I might whip up a PR for it. https://github.com/dawonn-haval/rosbag2/commits/foxy
With this trace, I noticed that the 'small' topics are processed very slowly while the 'large' topics are recording and they speed up significantly as soon as the 'large' topics disconnect.
My guess is that the sqlite3 back-end is capping out and that's somehow causing the subscribers to fail silently. I have to spend more time reading the code for the sqlite2 plugin to figure out exactly how this system is supposed to work.
Note for investigation: we may want to add a "WARN" when messages are dropped in the queue - we do know when this happens. That may be what's happening for this case.
Is this still an issue? We faced a similar problem recording a standard IMU message for 3h. We found that it stopped after 15seconds and we only collected 10k messages (with 800hz frequency..).
The timestamp correctly claimed 3 hours, but something stopped the actual recording.