rosbag2
rosbag2 copied to clipboard
Data loss when using compression (ros2 bag record)
Description
Recording big bags with ros2 bag record --max-bag-size=2000000000 --compression-mode file --compression-format zstd topics get lost during compression.
Expected Behavior
When a new bag is opened and the old one gets compressed I would expect the new bag to contain all topics published (those published during time of compression as well).
Actual Behavior
During the compression of the just closed bag (due to max-bag-size) there are no topics recorded in the new bag.
To Reproduce
- Start a system with big data occurring (in my case a camera (1280x720 @ 10fps) and motion data from a Bluetooth device)
- Record raw image topics and motion data with:
ros2 bag record --max-bag-size=2000000000 --compression-mode file --compression-format zstd - Replay the recorded bag
System
- OS: Ubuntu 18.04
- ROS 2 Distro: Foxy (built from source)
- Version: ros2
Additional Information
When recording bags without compression this issue does not occur.
Suspicion
Is it possible that the compression of the bag is run in a new thread and not a process? Because, as you probably know, in python a new thread cannot be distributed to a different CPU core only a process can. If recording and compressing are two threads running in the same process could it be that the compression thread blocks the recording thread by using all resources of this one CPU core?
Thank you!
Possibly related to https://github.com/ros2/rosbag2/issues/973
For context, the threading logic all happens in a C++ layer - the Python CLI is only a thin wrapper around calling the C++ core.
You mention you're building from source, are you using the foxy branch, or the foxy-future branch? I might recommend foxy-future as it has many performance improvements and bugfixes that could not be released officially into Foxy due to API breakage
Can be related to the #936, #866 and #647
> For context, the threading logic all happens in a C++ layer - the Python CLI is only a thin wrapper around calling the C++ core.
Thank you for clarification about Python CLI wrapping C++ logic. I did not know how this worked.
> You mention you're building from source, are you using the foxy branch, or the foxy-future branch? I might recommend foxy-future as it has many performance improvements and bugfixes that could not be released officially into Foxy due to API breakage
Actually I am not quite sure which branch. My work is greatly based on the ros.foxy.Dockerfile from dusty-nv/jetson-containers which uses rosinstall_generator --deps --rosdistro foxy ros_base ... to fetch the repos.
Update: So I am in fact using the foxy branch. Tried building the foxy-future branch but rosinstall_generator does only allow the base branch names (foxy, galactic, ...).
This is a very concerning bug - data loss is a worst case scenario for a data recording tool. Has anyone tried to repro/investigate it in galactic/humble/rolling?
This is a very concerning bug - data loss is a worst case scenario for a data recording tool. Has anyone tried to repro/investigate it in galactic/humble/rolling?
I haven't investigated it myself, but looking at the Dockerfile in use, the original reporter is likely using the foxy branch of this repository. That branch has known performance and dataloss issues, which is why we recommend the foxy-future branch there. It would be interesting to see if the original problem can be reproduced with the foxy-future branch, which is much closer to what is in Galactic.
Good point.
@chrmel have you tested this in Galactic?
@amacneil I tried building from source with branch galactic but it did not succeed. I can try test it with the pre-built packages.
@amacneil, @clalancette, @emersonknapp
Ok, I tested my setup with the pre-built debian packages for distros foxy and galactic.
Testing with sample data
Every step in the data represents the time a new split bag file beeing created.

foxy
Recording data with ros2 bag record --max-bag-size=500000000 --compression-mode file --compression-format zstd /image_raw /image_raw/compressed /motion.
Same issue as described when built from source. Everytime a new split bag file starts there is a data gap.
Play bag with ros2 bag play --topics=/motion --rate=1.0 rosbag2_foxy/

galactic and rolling
Recording data with ros2 bag record --max-bag-size=500000000 /image_raw /image_raw/compressed /motion.
I was not able to reproduce the problem BECAUSE a different problem occurred. It seems that split bag files cannot be properly played in galactic (with and without compression). When playing the bags only the last file of the sequence of split files is played properly. Preceding files seem to be either skipped or the data beeing squashed at the beginning of the last split bag file.
Play bag with ros2 bag play --read-ahead-queue-size 50000 --topics=/motion --rate=1.0 rosbag2/. Only the data from the last split bag file is played (you can see the last distinctive ripple in the data).

Issue related to the playing only last bag from split was found before and described here https://github.com/ros2/rosbag2/issues/966
Notes:
- With SQLite3 there is a current design limitation and suboptimal performance when using
--max-bag-sizediffers from 0 See https://github.com/ros2/rosbag2/issues/647#issuecomment-1855145280. Recommended to either use MCAP file format or not use--max-bag-sizeparameter with the SQlite3 backend for better performance. - Yes. There is a known issue that compression threads may consume all CPU resources and recording threads will be in starvation which will lead to the messages being lost. The solution will be done in the Add option to set compression threads priority #1457. However, there are no CLI parameter yet for the
compression_thread_priorityoption. Only will be available via node parameters for the composable node. The follow-up PR with adding CLI parameter is welcome.
- Closing this issue as stale and since there are workarounds already exists.