rosbag2 icon indicating copy to clipboard operation
rosbag2 copied to clipboard

Revisit rosbag2 internal data structures

Open Karsten1987 opened this issue 3 years ago • 9 comments

For the longest time I am actually kind of unhappy with the way we spread out our data structures in rosbag2_cpp and rosbag2_storage. Across these two we have:

struct TopicInformation
{
  TopicMetadata topic_metadata;
  size_t message_count;
};

struct BagMetadata
{
  int version = 4;  // upgrade this number when changing the content of the struct
  uint64_t bag_size = 0;  // Will not be serialized
  std::string storage_identifier;
  std::vector<std::string> relative_file_paths;
  std::chrono::nanoseconds duration;
  std::chrono::time_point<std::chrono::high_resolution_clock> starting_time;
  uint64_t message_count;
  std::vector<TopicInformation> topics_with_message_count;
  std::string compression_format;
  std::string compression_mode;
};

struct TopicMetadata
{
  std::string name;
  std::string type;
  std::string serialization_format;
  // Serialized std::vector<rclcpp::QoS> as a YAML string
  std::string offered_qos_profiles;

  bool operator==(const rosbag2_storage::TopicMetadata & rhs) const
  {
    return name == rhs.name && type == rhs.type && serialization_format == rhs.serialization_format;
  }
};

struct SerializedBagMessage
{
  std::shared_ptr<rcutils_uint8_array_t> serialized_data;
  rcutils_time_point_value_t time_stamp;
  std::string topic_name;
};

typedef struct ROSBAG2_CPP_PUBLIC_TYPE rosbag2_introspection_message_t
{
  void * message;
  char * topic_name;
  rcutils_time_point_value_t time_stamp;
  rcutils_allocator_t allocator;
} rosbag2_introspection_message_t;

I feel there's a strong redundancy in quite a few of them, which makes the actual rosbag2_cpp API really user-unfriendly.

Semantically, I mostly would like to revisit the SerializedBagMessage. I don't think it's sensible to attach serialized data strictly to a topic. I would much rather see it attached to a type. The consequence of this would be that we don't just have a call to write(serialized_message), but more like a write(serialized_message, "topic_name") which to me also makes way more sense.

Karsten1987 avatar Mar 11 '21 19:03 Karsten1987

Thinking out loud for a moment:

If we decouple the serialized payload from the topic information as such:

struct SerializedBagMessage
{
  std::shared_ptr<rcutils_uint8_array_t> serialized_data;
  rcutils_time_point_value_t time_stamp;
  std::string type_name;
  std::string serialization_format;
};

struct TopicMetadata
{
  std::string name;
  // Serialized std::vector<rclcpp::QoS> as a YAML string
  std::string offered_qos_profiles;

  bool operator==(const rosbag2_storage::TopicMetadata & rhs) const
  {
    return name == rhs.name;
  }
};

we'd need a slightly more complex API for reading and writing:

  void write(
    std::shared_ptr<const rosbag2_storage::SerializedBagMessage> message,
    const rosbag2_storage::TopicMetadata & metadata);

  std::pair<std::shared_ptr<rosbag2_storage::SerializedBagMessage>, rosbag2_storage::TopicMetadata> read_next();

I believe though that ultimately this leads to a more intuitive API. The SerializedBagMessage is somehow self contained, meaning all information needed to deserialize is attached to the actual data structure, and the TopicMetadata itself really only describes how to re-publish the data.

Karsten1987 avatar Mar 11 '21 20:03 Karsten1987

Please let me know what you think about it. This ticket is generally also in conjunction with #457 which introduces rclcpp as a dependency to rosbag2_cpp and thus lets use rclcpp::SerializedMessage.

I hope to get some of this in before Galactic feature freeze.

Karsten1987 avatar Mar 12 '21 01:03 Karsten1987

This turns out to be a non-trivial change. The main culprit I am seeing so far is that the serialized bag message as well as the topic metadata are represented in the sqlite3 data base as they are. That means, modifying these values leads to a modification in the database layout. That further means, we'd have to increase the bag-version number once more making it basically impossible to read old bag files with a current/newer rosbag version. Pretty unsatisfying ..

Karsten1987 avatar Mar 15 '21 23:03 Karsten1987

modifying these values leads to a modification in the database layout

Must this be true? I would think we could change the API but keep it so that the insert statements stay the same. This may make the write/read portion of storage implementation a little more complex. But, the rosbag2 generic API and the on-disk bag representation don't have to be a 1:1 match.

we'd have to increase the bag-version number once more making it basically impossible to read old bag files with a current/newer rosbag version.

Even if we accept the above, that we change database layout - I don't think this means we have to change the bag version number. My understanding of the bag version number is that it relates only to the metadata format, and that the storage implementation is responsible for everything that happens after that. In that case, if the metadata stays the same but the layout changes, would maybe the storage implementation (such as sqlite3) need its own version identifier?

Regardless of that - rosbag2 needs to solve the "old bag problem" in some way - we will have to make changes at some point, it seems inevitable. I see two ways:

  • ros2 bag convert which knows how to update bags from older versions to newer ones
  • Versioning API internal to the reader that knows, based on version number, how to read various versions

We have already done some ad-hoc implementation of the versioning API in the metadata parsing - although I'm not sure if it's the ideal way to handle it, it seems like it will become unmanageable after a few versions.

emersonknapp avatar Mar 17 '21 21:03 emersonknapp

Regardless of that - rosbag2 needs to solve the "old bag problem" in some way - we will have to make changes at some point, it seems inevitable. I see two ways:

* `ros2 bag convert` which knows how to update bags from older versions to newer ones

* Versioning API internal to the reader that knows, based on version number, how to read various versions

I am actually interested in this... I find the conversion of data from one format to another to be an interesting puzzle. Should I open a brainstorming issue for this?

jhdcs avatar Apr 06 '21 12:04 jhdcs

If you have ideas which you can concentrate solely around the bag conversion topic, I'd suggest to open a dedicated ticket for it. For other general data type issues, we can continue the discussion here.

Karsten1987 avatar Apr 06 '21 15:04 Karsten1987

Semantically, I mostly would like to revisit the SerializedBagMessage. I don't think it's sensible to attach serialized data strictly to a topic. I would much rather see it attached to a type.

Just discovering the rosbag2 api, but I strongly agree with this. It would be nice to be able to check the type of the message contained in SerializedBagMessage without having to deserialize it.

doisyg avatar Apr 28 '21 10:04 doisyg

It would be nice to be able to check the type of the message contained in SerializedBagMessage without having to deserialize it.

Just want to note that you don't actually have to do this, I think it's slightly beside the point of the comment - there is a topics -> types map in the BagMetadata that you can use for lookups. The point is more that it makes more sense to group data together in a different way for the internal data flow.

emersonknapp avatar Apr 28 '21 17:04 emersonknapp

It would be nice to be able to check the type of the message contained in SerializedBagMessage without having to deserialize it.

Just want to note that you don't actually have to do this, I think it's slightly beside the point of the comment - there is a topics -> types map in the BagMetadata that you can use for lookups. The point is more that it makes more sense to group data together in a different way for the internal data flow.

Yes but that implies having access to the BagMetadata object which you don't necessarily have when processing a rosbag2_storage::SerializedBagMessage, whereas in ros1 you had this information accessible from a rosbag::MessageInstance (which I see as the ros1 equivalent of rosbag2_storage::SerializedBagMessage). I am in the middle of ros1->ros2 code conversion which extensively uses the rosbag cpp api. I have the feeling that having the equivalent of a_rosbag_message_instance.isType<custom::my_msg>() would have make it much simplier

doisyg avatar Apr 29 '21 10:04 doisyg

  • Closing this issue in favor of the other opened https://github.com/ros2/rosbag2/issues/1553. After https://github.com/ros2/rosbag2/issues/1553 is implemented, one can match topic_id to the topic_type or topic_name stored in metadata. i.e. having a look-up tables for that

We have complaints from other users about the excessive memory usage of the rosbag2 recorder by storing the topic name as a string in the struct SerializedBagMessage. Adding more fields to the struct SerializedBagMessage will even worsen the situation with memory consumption in inner rosbag2 buffers.

MichaelOrlov avatar May 05 '24 04:05 MichaelOrlov