rosbag2
rosbag2 copied to clipboard
[humble] How to avoid frames dropped without buffsize and chunksize options?
Hi Everyone,
I am recording IMU topic at 200MHz together with /camera/left/image_raw and /camera/right/image_raw at 20Hz. However, I observed certain frame drops of either left or right camera topics :
I even throttled the image topics to 5Hz but was still facing the drops:
The resolution of the image frame is 1280x720.
In ROS1 version, there's buffsize and chunksize that could be used to solve this issue I suppose. However, they are not available anymore in the ROS2 version.
There's also another suggestion to override the QoS settings of ros2 bag record. Unfortunately, I received the Incorrect QoS compatibility warnings if I set the reliability of the relevant topic to reliable.
Could someone give me an advice, please ?
Thanks in advance and best regards, Khang
Did you try the --max-cache-size option for ros2 bag record? That may not help if recording just can't keep up with the data input rate. You may have to record topics in separate bags/processes.
@l4es The "recording IMU topic at 200MHz together with /camera/left/image_raw and /camera/right/image_raw at 20Hz" is not a very good idea at first hand. At least because camera frames will delay or/and will cause frame drops on the IMU topic.
I recommend recording high frame rate topics as IMU in a separate bag file and then gluing them together or using distributed replay.
Hi @christophebedard,
I did try --max-cache-size 104857600 but drops still happened.
Hi @MichaelOrlov,
Indeed I could also see the IMU drops as well. Thanks for your suggestion to split the bag file per topic. May I know if this was the same approach with which you'd been able to record up to the 2.6 GBytes/Sec from 8 cameras (without any drop !?!?) ?
... then gluing them together or using distributed replay.
As I am new with ROS, could you help to share some way/tool to merge the separate bag files so that the synchronization of the topics is guaranteed, please ? Our own idea is merging the left and right image topics into one before being recorded with rosbag.
Thanks in advance and best regards, Khang
As I am new with ROS, could you help to share some way/tool to merge the separate bag files so that the synchronization of the topics is guaranteed, please ? Our own idea is merging the left and right image topics into one before being recorded with rosbag.
If you replay them separately, they indeed won't be sync. You can first merge them into 1 bag (time-synchronized). See "Example merge" under this section in the README: https://github.com/ros2/rosbag2/tree/humble#converting-bags
I'm currently working on adding support for replaying multiple bags (time-synchronized of course), which means you wouldn't need to first merge the bags into 1 bag before replaying them, although this feature will not be available in Humble.
Just out of curiosity,
recording IMU topic at 200MHz
is this really 200 MHz data rate supported with physical sensor? it is almost processor frequency like Cortex-M???
Hi @l4es, As @christophebedard mentioned, we have the ros2 bag convert CLI tool, which is very powerful. However, in the current implementation, it doesn't take into account timestamps when merging multiple bags.
It basically takes the next message from each bag and puts it in the new bag in a round-robin cycle.
If you have a topic in one bag with a frequency at least twice more than other topics from other bags, it is likely that messages from that topic with higher frequency will not be ordered by timestamp in the resulting merged bag.
As @christophebedard mentioned, we gonna soon create a PR with an extension for the Rosbag2 player to allow playback from multiple bag files and respect the received timestamp for each topic and bag file.
Hi @christophebedard and @MichaelOrlov,
It is good to know and still okay for me to merge separate input bags into one output bag, as long as the output bag would be similar to the case of recording multiple topics into single bag file in terms of timing and synchronization (time-synchronized).
By the way of mentioning about the topics' timing and sync., I would like your confirmation about the chronical order of the topics in the output bag : Are they synced by the time they were received for recording or by the timestamp in their headers, please ?
@fujitatomoya :
is this really 200 MHz data rate supported with physical sensor? it is almost processor frequency like Cortex-M???
I used an MCU to read the IMU by interruption and to publish the IMU topic with microROS over serial port.
Best Regards, Khang
By the way of mentioning about the topics' timing and sync., I would like your confirmation about the chronical order of the topics in the output bag : Are they synced by the time they were received for recording or by the timestamp in their headers, please ?
Ordered by the time each message was received by the recorder, see https://github.com/ros2/rosbag2/blob/ee7f68a9ea3b4fdba3e9aeb02e639cd93b552bb4/rosbag2_transport/src/rosbag2_transport/bag_rewrite.cpp#L56. They are not ordered by the original publication timestamp.
Hi @christophebedard,
Thanks for your confirmation.
@MichaelOrlov, I did some extra tests in order to avoid as many as possible the frame drops :
-
Original case with separate left & right topics in raw :
-
Assemble left & right topic of my stereo camera :
-
Assemble and compressed left & right topic of my stereo camera :
It looks like that the drops are caused by the amount of data written to the storage (which is quite natural) : with compressed image, less data is written (~700MB compared to 2.6GB for around 1 minutes of recording) thus less drops. And another conclusion is that it is hard to achieve the ideal situation in which there's no data/frame drop, especially in case of image topics with which there's high amount of data.
Hi @l4es, First of all, need to figure out on what side frame drops happened.
- Do you use the bag split by size feature, or writing in the one bag file continiously?
- Do you see this message in the Rosbag2 recorder console
"Cache buffers lost messages per topic: " - What is your maximum disk write speed for the block size similar to the size of the camera image which you are storing in the bag?
- What is QoS settings? Reliability, Durability and more important DDS queue sizes?
- Have you tried to use composable recorder node to avoid extra data copies and transfer over the DDS?
- Have you tried to run recorder on an isolated cpu cores (at least 2-4) ?
Hi @MichaelOrlov ,
Thanks for your suggestions. Below are my answers :
- Do you use the bag split by size feature, or writing in the one bag file continuously?
I recorded single bag file continuously.
- Do you see this message in the Rosbag2 recorder console "Cache buffers lost messages per topic: "
No, I do not see the above message.
- What is your maximum disk write speed for the block size similar to the size of the camera image which you are storing in the bag?
Would it be possible to do that with dd command?
- What is QoS settings? Reliability, Durability and more important DDS queue sizes?
About the QoS settings, as said in initial comment, I tried the solution given in the following link https://robotics.stackexchange.com/questions/100644/detect-messages-dropped-by-ros2-bag-record, but got the Incorrect QoS compatibility warnings if I set the reliability of the relevant topic to reliable. So I believe that they are default settings, especially Reliability = Best Effort. About the DDS queue sizes, I haven't known how to set it. Could you advice?
- Have you tried to use composable recorder node to avoid extra data copies and transfer over the DDS?
My own publishing node has not been designed to be composable yet, is it possible to use just the composable recorder similar to this one : https://github.com/berndpfrommer/rosbag2_composable_recorder as I am still with ROS2 Humble?
- Have you tried to run recorder on an isolated cpu cores (at least 2-4) ?
I haven't tried this approach yet.
Hi @l4es
If you don't see the message "Cache buffers lost messages per topic: " when you stop recording with CTRL+C, it does mean that messages drop happened on the transport layer and internal Rosbag2 buffers didn't get to full.
To benchmark your disk writing speed, please refer to the https://github.com/ros2/rosbag2/tree/rolling/rosbag2_performance/rosbag2_performance_benchmarking#general-knowledge-io-benchmarking
If you are trying to override QoS settings and getting error messages that QoS settings are not compatible, it means that the publisher-side QoS settings need to be adjusted as well. Please refer to the https://docs.ros.org/en/rolling/Concepts/Intermediate/About-Quality-of-Service-Settings.html article. The DDS queue size is the QoS parameter
Depth
- Queue size: only honored if the “history” policy was set to “keep last”.
The composable recorder and player nodes are supported starting from the Jazzy release.
Hi @MichaelOrlov,
I would like to re-confirm that I did see the "Cache buffers lost messages per topic: " message when stopping recording of /camera/left/image_raw and /camera/right/image_raw with CTRL+C :
[INFO] [1731061741.190553915] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
[WARN] [1731061741.190605501] [rosbag2_cpp]: Cache buffers lost messages per topic:
/camera/right/image_raw: 1
/camera/left/image_raw: 2
Total lost: 3
[INFO] [1731061741.199839269] [rosbag2_recorder]: Event publisher thread: Exiting
[INFO] [1731061741.200036620] [rosbag2_recorder]: Recording stopped
[INFO] [1731061741.201083981] [rosbag2_recorder]: Recording stopped
Also, I managed to run the rosbag2_performance benchmarking on the testing system (Jetson Orin NX + Samsung NVMe SSD) which gives below result :
~/Workspace/Nvidia/ros_ws$ src/rosbag2/rosbag2_performance/rosbag2_performance_benchmarking/scripts/report_gen.py -i rosbag2_performance_test_results/test_mixed_110Mbs_no_transport_2024-11-09_12-17-22/
100Mbs_large:
msg_count_each: 100
msg_size_bytes: 10000000
publishers_count: 1
qos:
qos_depth: 5
qos_durability: volatile
qos_reliability: best_effort
rate_hz: 10
topic_root: benchmarking_large
10Mbs_many_frequent_small:
msg_count_each: 2000
msg_size_bytes: 100
publishers_count: 500
rate_hz: 200
topic_root: benchmarking_small
publisher_groups:
- 10Mbs_many_frequent_small
- 100Mbs_large
wait_for_subscriptions: true
Results:
Repetitions: 2
Max bagfile size: 0
Compression: <default>
Compression threads: 0
Compression queue size: 1
Recorded messages for different caches and storage config:
storage config: default:
cache 10,000,000 - min: 99.37%, average: 99.63%, max: 99.88%
cache 100,000,000 - min: 94.95%, average: 97.47%, max: 100.00%
storage config: storage_resilient.yaml:
cache 10,000,000 - min: 25.22%, average: 26.66%, max: 28.11%
cache 100,000,000 - min: 48.18%, average: 50.17%, max: 52.15%
Results:
Repetitions: 2
Max bagfile size: 0
Compression: zstd
Compression threads: 0
Compression queue size: 1
Recorded messages for different caches and storage config:
storage config: default:
cache 10,000,000 - min: 45.40%, average: 56.57%, max: 67.73%
cache 100,000,000 - min: 63.85%, average: 64.60%, max: 65.35%
storage config: storage_resilient.yaml:
cache 10,000,000 - min: 18.12%, average: 18.26%, max: 18.40%
cache 100,000,000 - min: 42.01%, average: 42.14%, max: 42.28%
And some first I/O benchmarking : with dd :
~/Workspace/Nvidia/ros_ws$ dd if=/dev/zero of=/tmp/output conv=fdatasync bs=384k count=1k; rm -f /tmp/output
1024+0 records in
1024+0 records out
402653184 bytes (403 MB, 384 MiB) copied, 1.90264 s, 212 MB/s
then with fio :
~/Workspace/Nvidia/ros_ws$ fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=write --size=500m --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.16
Starting 1 process
TEST: Laying out IO file (1 file / 500MiB)
Jobs: 1 (f=1): [W(1)][12.1%][w=199MiB/s][w=199 IOPS][eta 00m:51s]
Jobs: 1 (f=1): [W(1)][22.0%][w=235MiB/s][w=235 IOPS][eta 00m:46s]
Jobs: 1 (f=1): [W(1)][32.8%][w=141MiB/s][w=141 IOPS][eta 00m:39s]
Jobs: 1 (f=1): [W(1)][41.7%][w=230MiB/s][w=230 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [W(1)][51.7%][w=117MiB/s][w=117 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=65.0MiB/s][w=65 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][w=86.1MiB/s][w=86 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=114MiB/s][w=114 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][w=78.0MiB/s][w=78 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=202MiB/s][w=202 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=229317: Sat Nov 9 12:37:28 2024
write: IOPS=157, BW=157MiB/s (165MB/s)(9478MiB/60225msec); 0 zone resets
slat (usec): min=59, max=447593, avg=697.87, stdev=13175.38
clat (msec): min=3, max=1055, avg=202.15, stdev=138.21
lat (msec): min=3, max=1055, avg=202.85, stdev=138.15
clat percentiles (msec):
| 1.00th=[ 16], 5.00th=[ 61], 10.00th=[ 114], 20.00th=[ 127],
| 30.00th=[ 130], 40.00th=[ 133], 50.00th=[ 138], 60.00th=[ 146],
| 70.00th=[ 209], 80.00th=[ 309], 90.00th=[ 393], 95.00th=[ 510],
| 99.00th=[ 667], 99.50th=[ 718], 99.90th=[ 894], 99.95th=[ 986],
| 99.99th=[ 1053]
bw ( KiB/s): min= 4096, max=296960, per=100.00%, avg=161194.89, stdev=81609.28, samples=120
iops : min= 4, max= 290, avg=157.38, stdev=79.71, samples=120
lat (msec) : 4=0.09%, 10=0.49%, 20=0.80%, 50=2.62%, 100=4.57%
lat (msec) : 250=67.52%, 500=18.36%, 750=5.23%, 1000=0.28%, 2000=0.03%
cpu : usr=0.56%, sys=0.86%, ctx=8995, majf=0, minf=25
IO depths : 1=0.2%, 2=0.4%, 4=0.8%, 8=1.6%, 16=3.2%, 32=93.8%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.2%, 64=0.0%, >=64=0.0%
issued rwts: total=0,9478,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=157MiB/s (165MB/s), 157MiB/s-157MiB/s (165MB/s-165MB/s), io=9478MiB (9938MB), run=60225-60225msec
Disk stats (read/write):
nvme0n1: ios=64/9730, merge=0/382, ticks=14399/1823067, in_queue=1838088, util=99.75%
finally with strace :
~/Data/perf-tests/endurance/raw/strace$ strace -c ros2 bag record --max-cache-size 10 /camera/left/image_raw /camera/right/image_raw /imu/data_raw
[INFO] [1731130928.971590855] [rosbag2_recorder]: Press SPACE for pausing/resuming
[INFO] [1731130928.979452208] [rosbag2_storage]: Opened database 'rosbag2_2024_11_09-12_42_08/rosbag2_2024_11_09-12_42_08_0.db3' for READ_WRITE.
[INFO] [1731130928.983323172] [rosbag2_recorder]: Listening for topics...
[INFO] [1731130928.983328452] [rosbag2_recorder]: Event publisher thread: Starting
[INFO] [1731130928.983734087] [rosbag2_recorder]: Recording...
[INFO] [1731130929.588204464] [rosbag2_recorder]: Subscribed to topic '/imu/data_raw'
[INFO] [1731130929.891170185] [rosbag2_recorder]: Subscribed to topic '/camera/left/image_raw'
[INFO] [1731130929.892392719] [rosbag2_recorder]: Subscribed to topic '/camera/right/image_raw'
[INFO] [1731130929.892448975] [rosbag2_recorder]: All requested topics are subscribed. Stopping discovery...
strace: Process 229771 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.30 10.381692 29244 355 10 futex
0.19 0.019688 3 5935 2097 newfstatat
0.16 0.016992 3 4301 1803 openat
0.06 0.006653 4 1530 getdents64
0.06 0.006214 1 3256 read
0.05 0.005637 1 4007 fstat
0.05 0.005104 2 2518 close
0.04 0.003700 1 3133 3 lseek
0.03 0.003051 1 2172 1520 ioctl
0.02 0.001938 6 302 mmap
0.01 0.001197 6 193 mprotect
0.00 0.000386 20 19 sendmsg
0.00 0.000347 5 63 recvmsg
0.00 0.000297 5 59 brk
0.00 0.000285 6 46 munmap
0.00 0.000227 3 70 69 mkdirat
0.00 0.000165 11 14 clone
0.00 0.000164 5 28 socket
0.00 0.000134 9 14 sendto
0.00 0.000124 4 27 19 setsockopt
0.00 0.000121 2 45 21 faccessat
0.00 0.000106 3 31 7 flock
0.00 0.000076 25 3 fallocate
0.00 0.000075 5 13 1 unlinkat
0.00 0.000063 6 10 write
0.00 0.000060 3 18 6 bind
0.00 0.000050 0 69 fcntl
0.00 0.000037 2 14 epoll_ctl
0.00 0.000029 3 9 getsockname
0.00 0.000027 2 12 pwrite64
0.00 0.000025 5 5 4 linkat
0.00 0.000021 1 13 getpid
0.00 0.000019 0 73 rt_sigaction
0.00 0.000018 4 4 rt_sigprocmask
0.00 0.000015 3 4 2 readlinkat
0.00 0.000010 3 3 fchmod
0.00 0.000010 1 9 pread64
0.00 0.000009 9 1 sysinfo
[INFO] [1731131073.136563494] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
0.00 0.000008 2 3 getrandom
0.00 0.000005 1 3 dup
0.00 0.000005 5 1 geteuid
0.00 0.000004 2 2 getcwd
0.00 0.000004 4 1 statfs
0.00 0.000004 4 1 set_tid_address
0.00 0.000004 4 1 sigaltstack
0.00 0.000004 4 1 getegid
0.00 0.000004 2 2 gettid
0.00 0.000004 4 1 prlimit64
0.00 0.000003 3 1 eventfd2
0.00 0.000003 3 1 set_robust_list
0.00 0.000003 3 1 getuid
0.00 0.000003 3 1 getgid
0.00 0.000003 1 2 getsockopt
0.00 0.000002 2 1 epoll_create1
0.00 0.000001 1 1 timerfd_create
0.00 0.000000 0 1 uname
0.00 0.000000 0 1 execve
------ ----------- ----------- --------- --------- ----------------
100.00 10.454830 28404 5562 total
[WARN] [1731131073.136686888] [rosbag2_cpp]: Cache buffers lost messages per topic:
/camera/right/image_raw: 466
/camera/left/image_raw: 305
/imu/data_raw: 3216
Total lost: 3987
By the way, I think I found how to configure the buffsize and chunksize similarly to ROS1 rosbag by configuring the MCAP writter. I also used below configuration :
$ vi mcap_writer_options.yaml :
noChunkCRC: false
noChunking: false
noMessageIndex: false
noSummary: false
chunkSize: 10485760
compression: "Zstd"
compressionLevel: "Fastest"
forceCompression: false
$ ros2 bag record -s mcap --max-cache-size 1048576000 --storage-config-file /path-to/mcap_writer_options.yaml /camera/left/image_raw /camera/right/image_raw /imu/data_raw
However, the frame drops could not be avoided.
P/S : I will modify the code to be able to adjust the QoS settings of published messages and keep you updated later.
Hi, did you see any improvements by updating the QoS settings?