ros_comm icon indicating copy to clipboard operation
ros_comm copied to clipboard

rosbag: introduce external index file to speed up reading

Open UniBwTAS opened this issue 3 years ago • 8 comments

Hello,

we propose to optionally use an external index file e.g. *.idx to speedup reading. This is particularly useful for bags which reside on a slow HDD or (even worse) on a mounted network drive. This can be quite slow. Our bags are about 100GB or even larger (they contain a lot of camera images, point clouds, etc).

The "traditional" way of reading a ROS bag requires to read in the whole file as the index records (required to locate the individual serialized messages within the file) are spread across the whole file. This is very I/O intensive and requires some amount of time (~4min for 100GB on my HDD). In case of the mounted network drive, rosbag is almost unusable.

Compared to our large bag, the index file is just ~100MB. After reading this index file, the locations of the serialized messages inside the bag are known and the bag can be played normally. Since only 100MB have to be read, a bag on the disk starts to play almost instantly (2-3s). Depending on the bandwidth it is even reasonable to play bag from a network drive. It is just required to load the 100MB index file completely. Afterwards the the large bag file can be loaded sequentially step-by-step.

This index file is essentially a regular bag file, where the large chunk records, which contain the actual payload of the messages, are pruned. To get the full meta data information even without chunk records it was additionally necessary to extend the data part of the index data records by "chunk offset" (besides time and offset <- offset of a message within a chunk). It describes the position of the chunk within the bag file. Since this index file has (almost) the format of a regular bag file, the code required to read them is almost the same. Only small changes were necessary.

These index files can be easily generated from an existing bag file. A corresponding script is also added in this pull request. Furthermore, this pull request is fully backwards compatible as reading with index file is only used when a file is found next to the bag (same name but with file extension ".idx"). Otherwise the bag is opened regularly.

Command to generate an index file: rosbag index-file 2021-08-09-08-55-29.bag Start player normally: rosbag play 2021-08-09-08-55-29.bag (uses generated .idx file if it exists)

I think it would also address/fix #117.

UniBwTAS avatar Aug 26 '21 12:08 UniBwTAS

I tested by manually building your edited rosbag and rosbag_storage packages and it works as expected for me. Additionally, it works through the C++ interface as expected:

rosbag::Bag bag;
bag.open(path_to_bag, rosbag::bagmode::Read);

In general a >100gb file now opens in around a second, compared to 1-2 minutes I was getting before off my external SSD. This is a great alternative to switching the chunks size that was done for the TUM-VI dataset. Using smaller chunks gives me smooth playback, and with this index file the opening is much faster. https://vision.in.tum.de/data/datasets/visual-inertial-dataset#faq

goldbattle avatar Apr 06 '22 16:04 goldbattle

I actually get an error on a bag that I had to reindex on.

patrick@patrick-ThinkPad-P51:/media/patrick/RPNG_FLASH_5/d455$ rosrun rosbag rosbag index-file d455_room_07_stereo.bag 
Start reading bag indices of d455_room_07_stereo.bag ...
Generate index file d455_room_07_stereo.idx ...
Traceback (most recent call last):
  File "/home/patrick/workspace/catkin_ws_kalibr/src/rosbag_custom/rosbag/scripts/rosbag", line 35, in <module>
    rosbag.rosbagmain()
  File "/home/patrick/workspace/catkin_ws_kalibr/src/rosbag_custom/rosbag/src/rosbag/rosbag_main.py", line 1168, in rosbagmain
    cmds[cmd](argv[2:])
  File "/home/patrick/workspace/catkin_ws_kalibr/src/rosbag_custom/rosbag/src/rosbag/rosbag_main.py", line 678, in gen_idx_file_cmd
    index_file_op(args, options.force)
  File "/home/patrick/workspace/catkin_ws_kalibr/src/rosbag_custom/rosbag/src/rosbag/rosbag_main.py", line 986, in index_file_op
    op = _read_uint8_field(header, 'op')
  File "/home/patrick/workspace/catkin_ws_kalibr/src/rosbag_custom/rosbag/src/rosbag/bag.py", line 1994, in _read_uint8_field
    def _read_uint8_field (header, field): return _read_field(header, field, _unpack_uint8)
  File "/home/patrick/workspace/catkin_ws_kalibr/src/rosbag_custom/rosbag/src/rosbag/bag.py", line 1983, in _read_field
    raise ROSBagFormatException('expected "%s" field in record' % field)
rosbag.bag.ROSBagFormatException: expected "op" field in record

Seems to be due to this line here: https://github.com/ros/ros_comm/blob/981d29320ff6791865062e53fdd24ab760934671/tools/rosbag/src/rosbag/rosbag_main.py#L986

I added some small printouts and got this:

        # copy all records to output bag (except for chunk records and index data records)
        print("Generate index file %s ..." % output_idx_raw_path)
        while input_bag_raw.tell() < input_bag_raw_size:
            header = _read_header(input_bag_raw)
            print(str(input_bag_raw.tell()) + " of " + str(input_bag_raw_size))
            print(header)
            op = _read_uint8_field(header, 'op')

Final print before crash:

148282766751 of 148306297758
{'conn': b'\x03\x00\x00\x00', 'count': b'\x01\x00\x00\x00', 'op': b'\x04', 'ver': b'\x01\x00\x00\x00'}
148282766812 of 148306297758
{'compression': b'none', 'op': b'\x05', 'size': b'pl\x0c\x00'}
148283581059 of 148306297758
{'conn': b'\x04\x00\x00\x00', 'count': b'\x01\x00\x00\x00', 'op': b'\x04', 'ver': b'\x01\x00\x00\x00'}
148283581079 of 148306297758
{}

I will try to upload the bag. Do you have an email I can send the link to?

goldbattle avatar Apr 06 '22 16:04 goldbattle

@goldbattle Nice to see, that someone is using this feature. Sorry for the delay, Is this a private Rosbag? Im not sure whether it is a good idea to post my email adress here.

AndreasR30 avatar Apr 11 '22 08:04 AndreasR30

Email me at ..... and I can send you the link.

goldbattle avatar Apr 11 '22 14:04 goldbattle

Thanks for the PR!

Hommus avatar May 18 '22 07:05 Hommus

@goldbattle Sorry I was quite busy in last weeks. This seems to be rather a bug in the reindex feature, since a record header shouldn't be empty as one can see in your debug output. Of course I could add a try-except around _read_uint8_field() and skip this record. However, this wouldn't fix the original bug/issue. I will try to investigate the reindex feature, how a empty record header can happen.

AndreasR30 avatar Jun 09 '22 11:06 AndreasR30