rosbag2 icon indicating copy to clipboard operation
rosbag2 copied to clipboard

Store message definition in bag

Open jhurliman opened this issue 3 years ago • 26 comments

Description

Neither the metadata.yaml file or .db3 file(s) contain the ROS2 message definitions used to generate the bag. This makes it impossible to deserialize rosbag2 files without having the correct version of all the original .msg files on the machine that is decoding the bag. Tools like Foxglove Studio are not able to decode rosbag2 files, and if message definitions are ever modified you have to store a full snapshot of message definitions alongside each bag file. If the wrong message definition is used during deserialization, there doesn't appear to be any mechanism to catch this leading to either decoding failures or potentially invalid decoding.

This is a regression from the rosbag1 format, where a .bag file is self describing and can be decoded even if the collection of .msg files used to generate it have been lost, or if a .bag file is shared with someone else.

jhurliman avatar Jun 15 '21 19:06 jhurliman

Thanks for filing this ticket - it's something that has come up as part of other discussions but was not explicitly tracked as its own feature request. I've re-worded the title to name the positive feature to build (rather than the negative lack of feature)

emersonknapp avatar Jun 15 '21 20:06 emersonknapp

@jhurliman I have not personally used this package, and there's definitely some major assumptions made in the codebase, but I think this is doing a lot of what you're envisioning: https://gitlab.com/ternaris/rosbags

danthony06 avatar Jun 16 '21 19:06 danthony06

Ah, looking at how they do that, it looks like they're manually embedding a bunch of information about the message format, so it can't handle arbitrary messages. I do like the idea of having all of the information to fully extra the messages from a bag embedded in the actual bag file.

danthony06 avatar Jun 17 '21 02:06 danthony06

@danthony06 Writing as one of the authors: rosbags so far does not store message definitions. It supports reading rosbag2 with custom message types by registering the idl/msg files, see https://ternaris.gitlab.io/rosbags/topics/typesys.html

There are two cases:

  1. conversion from rosbag1 to rosbag2; message definitions from rosbag1 are used to convert and we want to store these along with topic latching info for later deserialization and to convert back to rosbag1.
  2. recorded rosbag2; either the recorder or a post-processing tool needs to add the message definitions, depending on when these are available.

@emersonknapp et al: What are your thoughts on where and how to store message definitions and topic latching info?

chaoflow avatar Jun 17 '21 13:06 chaoflow

Hi all, i would like to get this feature as well. But as a first step it might be very usefull if the rosbag would store at least some kind of a hash of the message definition for each topic. With this you could check if you have the correct message definition available. Ich think rosbag1 stored both the full message definiton and some kind of hash for the message type.

pkess avatar Jun 17 '21 14:06 pkess

@chaoflow thank you for the correction, sorry I missed the details on how you're getting the format.

@pkess You're right that rosbag1 is storing the MD5 sum, but it's done on a connection basis, not message type basis, according to the documentation: http://wiki.ros.org/Bags/Format/2.0. I'm guessing this is to catch a corner case in ROS 1 where it might be possible to have nodes using the same message type, but are different versions with different internal fields. For example, Node A and B are compiled with one message definition and use that to communicate, while Nodes C and D have a message with the same name, but has been changed and built at a later time, and they can use that to communicate. So the bag file would have multiple entries for the same message name, but the implementation is different, so it has to track the MD5 of the connection.

danthony06 avatar Jun 17 '21 15:06 danthony06

What are your thoughts on where and how to store message definitions and topic latching info?

I see the obvious place as being in the TopicMetadata - this thing already exists and we should be able to safely expand it with some serialized representation of the message definition.

For latching info - this is handled using the Durability QoS using TRANSIENT_LOCAL durability with a history. This behavior provides a superset of the ROS 1 "latching" functionality. http://docs.ros.org/en/galactic/Concepts/About-Quality-of-Service-Settings.html#qos-policies for some info

emersonknapp avatar Jun 17 '21 15:06 emersonknapp

@emersonknapp Saving message definitions per topic could lead to some content duplication for topics sharing message types. Nested types (e.g. std_msgs/msg/Header) are often not part of the definition itself and need to be also stored somewhere. While yaml supports anchors and references, it would be easier to work with a flat storage for message definition metadata, e.g. with a type_definitions array as a sibling to topics_with_message_count.

Suggested content:

rosbag2_bagfile_information:
  type_definitions:
  - format: 'idl'
    definition: '{content of Bool.idl file}'
  - format: 'msg'
    definition: 'MSG: std_msgs/Int32\n{content of Int32.msg file or message_definition from ROS1 bag}'

The idl case is straightforward, as only the message definition content is necessary. The msg case requires one additional piece of information, namely the message type name. In the example above the message type name is prepended to the message definition, similar to concatenated definitions in message_definition fields of ROS1 bag connection headers. Alternatively, message type names could also be saved as a separate key. As both formats can define multiple types in a single message definition, a type name key would probably have to be a list of strings.

durko avatar Jul 05 '21 20:07 durko

My understanding is that DDS (and .idl files as an extension of that) is one possible transport system in ROS2 which has a pluggable transport layer. If that’s correct, making idl storage a first class concept in rosbag2 would be a leaky abstraction.

ROS1 bags handle this issue gracefully. The one thing they add beyond the raw msgdef is a checksum, so you don’t have to do a byte for byte long string comparison to see if a message definition has changed between two bags. ROS1 bags also store msgdefs per connection instead of per topic, which would fix the replay bug when two publishers publish to the same topic with different QoS profiles.

jhurliman avatar Jul 06 '21 01:07 jhurliman

My understanding is that DDS (and .idl files as an extension of that) is one possible transport system in ROS2 which has a pluggable transport layer. If that’s correct, making idl storage a first class concept in rosbag2 would be a leaky abstraction.

@jhurliman The proposal above is not bound to idl but already contains msg as other format.

ROS1 bags handle this issue gracefully. The one thing they add beyond the raw msgdef is a checksum, so you don’t have to do a byte for byte long string comparison to see if a message definition has changed between two bags.

When considering a checksum for message definitions for rosbag2 we should take normalization into account instead of simply hashing the string definition.

ROS1 bags also store msgdefs per connection instead of per topic, which would fix the replay bug when two publishers publish to the same topic with different QoS profiles.

QoS profile is a concept orthogonal to messages definitions, same as latching in rosbag1, i.e. a topic has a message type with a specific message definition and a topic can have a QoS profile.

ROS1 stores message definitions per connection and there can be multiple connections per topic with theoretically, regarding the storage format, different message types and definitions. I have not seen mixed-type-topics in practice, am not sure whether it is possible to generate them through recording, in contrast to manually writing bags, and don't see a use case for them so far.

chaoflow avatar Jul 16 '21 11:07 chaoflow

I agree that the mixed-type-topics is an unusual case, and unlikely to happen in practice. About the only time I can think of when I have encountered this is when there are multiple robots that are running different versions of software. It's generally not a great idea and usually indicates something wrong with the larger system, but it's certainly possible.

danthony06 avatar Jul 16 '21 19:07 danthony06

Lack of message definitions also affects the JS/WebSocket bridges rosbridge_suite and ros2-web-bridge. I just made a PR for the former that re-constitutes a plausible .msg definition from the generated Python classes (although the definition it generates is missing constants and default values): https://github.com/RobotWebTools/rosbridge_suite/pull/574

I hope this can be a temporary workaround, since it would be nicer to address this by including raw datatypes at the rosidl code generation level, so they can be used by both rosbag2 and dynamic tools like the rosbridge_server.

jtbandes avatar Jul 16 '21 20:07 jtbandes

In #819 we're discussing for db3 files to be self-contained, i.e. the message definitions need to be stored in each db3 file (in case of splitting) and are duplicated into metadata.yaml the same as happens already for TopicMetadata.

@emersonknapp you suggested to add message definitions to TopicMetadata, which would lead to duplication but we have the same pattern already for QoS profiles. Is it ok to duplicate message definitions as well? Do you see a way of storing nested types there as well?

Apart from where to store, for me it is not clear yet from where to get:

  • Does a recorder always have message definitions available?
  • Are there scenarios where these are not installed but only compiled into binaries?
  • Where does a recorder fetch the message definitions from?

chaoflow avatar Jul 22 '21 13:07 chaoflow

Most of these questions don't have answers yet, because there isn't a core mechanism for retrieving message definitions.

I've created https://github.com/ros2/ros2/issues/1159 to contain the discussion for this core feature. Hopefully, once we have that feature available, the rosbag2-side implementation will be relatively straightforward.

emersonknapp avatar Jul 22 '21 17:07 emersonknapp

Is it ok to duplicate message definitions as well?

Maybe I am thinking too naively here, but I can't imagine that the storage size of the message definition is in any case comparable to the amount of data (the actual incoming messages) we store in the actual rosbags. I don't really see a way of not duplicating the message definitions across the bag files, because as said, the bag files should presumably be self contained and independent. That way we can leverage tools like merging various bag files etc.

Karsten1987 avatar Jul 22 '21 19:07 Karsten1987

Note: discussion ros2 core functionality in ros2/ros2#1159

emersonknapp avatar Jul 28 '21 16:07 emersonknapp

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/announcing-live-ros-2-connections-in-foxglove-studio/22278/1

ros-discourse avatar Sep 14 '21 01:09 ros-discourse

Just reading this now, as part of some other work I'm doing, and I just wanted to point out that currently rosbag (AFAIK) requires the typesupport be installed locally for a message when recording or playing back, so the message definitions could be stored in those packages and accessed by rosbag locally, therefore not strictly requiring https://github.com/ros2/ros2/issues/1159 in order to address this feature. Obviously we'd like to have https://github.com/ros2/ros2/issues/1159 so that we can do away with this restriction that message packages need to be available locally for rosbag to record, but that's technically a separate issue.

Sorry if someone already pointed this out, because I was only able to skim the previous comments.

wjwwood avatar Feb 25 '22 02:02 wjwwood

so the message definitions could be stored in those packages and accessed by rosbag locally

This is the approach that @jhurliman is trying to take now, but isn't sure how to implement - see https://github.com/ros2/ros2/issues/1159#issuecomment-1064674121

emersonknapp avatar Mar 28 '22 22:03 emersonknapp

This is implemented now. See https://github.com/ros-tooling/rosbag2_storage_mcap/blob/main/rosbag2_storage_mcap/src/message_definition_cache.cpp

jhurliman avatar Mar 28 '22 22:03 jhurliman

Oh I see - nice! I might think best possible scenario is that this functionality is implemented in the ros2 core, and the contents passed to the storage implementation to store. Ideally the storage implementation doesn't have any dependency on the ROS2 system, and also the behavior could be shared between implementations. I do get that this was the fastest way to get it working in the mcap storage, though :).

emersonknapp avatar Mar 28 '22 22:03 emersonknapp

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/make-your-ros-2-files-self-contained-smaller-and-more-performant/25590/1

ros-discourse avatar May 16 '22 19:05 ros-discourse

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-bag-data-in-plotjuggler/27072/1

ros-discourse avatar Aug 24 '22 18:08 ros-discourse

We took the ideas from this thread, https://github.com/ros2/rosbag2/issues/819, https://github.com/ros2/rosbag2/issues/855, and several others and incorporated them into the design of MCAP, a highly efficient, self-contained recording format for ROS 2 and more.

A ROS 2 MCAP storage plugin is available for Foxy, Galactic, Humble and Rolling, and using it is as simple as ros2 bag record -s mcap.

Since this feature is now already supported by the MCAP storage plugin, I think we should either close it, or change the title to specifically request adding message definitions to files created by the sqlite plugin.

amacneil avatar Sep 13 '22 06:09 amacneil

I’ll close the issue since MCAP is stable and a preferred alternative now.

jhurliman avatar Sep 13 '22 06:09 jhurliman

@jhurliman I see how mcap solves #819, however I agree with @emersonknapp https://github.com/ros2/rosbag2/issues/782#issuecomment-1081232184 that storage of message definitions should not be specific to one format and especially should be realized for the default format. To that end I suggest to reopen this issue.

chaoflow avatar Sep 24 '22 09:09 chaoflow

@chaoflow FYI. Now we store message definitions in SQLite3 db3 files as well. I added new table message_definitions and incremented schema_version to 4 in schema table. Please refer to the #1293 for details. Looking forward for relevant updates in rosbags.

MichaelOrlov avatar Apr 20 '23 04:04 MichaelOrlov

@chaoflow FYI. Now we store message definitions in SQLite3 db3 files as well. I added new table message_definitions and incremented schema_version to 4 in schema table. Please refer to the #1293 for details. Looking forward for relevant updates in rosbags.

@MichaelOrlov Thank you very much! Is there a suggested migration path for previously recorded rosbag2?

chaoflow avatar Apr 24 '23 14:04 chaoflow

@chaoflow The migration path is simple. If db schema version < 4 there are no message_definition table.
Please see how we are determining db schema version here https://github.com/ros2/rosbag2/blob/18216763ac13d942ee582c4b658962e49179ba3d/rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp#L797-L814 Or you can directly check

if (database_->table_exists("message_definition")) {
  // TBD
}

MichaelOrlov avatar Apr 25 '23 15:04 MichaelOrlov

@MichaelOrlov Thanks and sorry for being unclear. I meant whether there is a migration path to get message definitions into bags that were previously recorded ? Something like: ros2 bag migrate oldbag newbag assuming the correct message definitions for the previously recorded bag are available.

chaoflow avatar Apr 25 '23 15:04 chaoflow