rosbag2
rosbag2 copied to clipboard
Revisit rosbag2 internal data structures
For the longest time I am actually kind of unhappy with the way we spread out our data structures in rosbag2_cpp and rosbag2_storage. Across these two we have:
struct TopicInformation
{
TopicMetadata topic_metadata;
size_t message_count;
};
struct BagMetadata
{
int version = 4; // upgrade this number when changing the content of the struct
uint64_t bag_size = 0; // Will not be serialized
std::string storage_identifier;
std::vector<std::string> relative_file_paths;
std::chrono::nanoseconds duration;
std::chrono::time_point<std::chrono::high_resolution_clock> starting_time;
uint64_t message_count;
std::vector<TopicInformation> topics_with_message_count;
std::string compression_format;
std::string compression_mode;
};
struct TopicMetadata
{
std::string name;
std::string type;
std::string serialization_format;
// Serialized std::vector<rclcpp::QoS> as a YAML string
std::string offered_qos_profiles;
bool operator==(const rosbag2_storage::TopicMetadata & rhs) const
{
return name == rhs.name && type == rhs.type && serialization_format == rhs.serialization_format;
}
};
struct SerializedBagMessage
{
std::shared_ptr<rcutils_uint8_array_t> serialized_data;
rcutils_time_point_value_t time_stamp;
std::string topic_name;
};
typedef struct ROSBAG2_CPP_PUBLIC_TYPE rosbag2_introspection_message_t
{
void * message;
char * topic_name;
rcutils_time_point_value_t time_stamp;
rcutils_allocator_t allocator;
} rosbag2_introspection_message_t;
I feel there's a strong redundancy in quite a few of them, which makes the actual rosbag2_cpp API really user-unfriendly.
Semantically, I mostly would like to revisit the SerializedBagMessage
. I don't think it's sensible to attach serialized data strictly to a topic. I would much rather see it attached to a type. The consequence of this would be that we don't just have a call to write(serialized_message)
, but more like a write(serialized_message, "topic_name")
which to me also makes way more sense.
Thinking out loud for a moment:
If we decouple the serialized payload from the topic information as such:
struct SerializedBagMessage
{
std::shared_ptr<rcutils_uint8_array_t> serialized_data;
rcutils_time_point_value_t time_stamp;
std::string type_name;
std::string serialization_format;
};
struct TopicMetadata
{
std::string name;
// Serialized std::vector<rclcpp::QoS> as a YAML string
std::string offered_qos_profiles;
bool operator==(const rosbag2_storage::TopicMetadata & rhs) const
{
return name == rhs.name;
}
};
we'd need a slightly more complex API for reading and writing:
void write(
std::shared_ptr<const rosbag2_storage::SerializedBagMessage> message,
const rosbag2_storage::TopicMetadata & metadata);
std::pair<std::shared_ptr<rosbag2_storage::SerializedBagMessage>, rosbag2_storage::TopicMetadata> read_next();
I believe though that ultimately this leads to a more intuitive API. The SerializedBagMessage
is somehow self contained, meaning all information needed to deserialize is attached to the actual data structure, and the TopicMetadata
itself really only describes how to re-publish the data.
Please let me know what you think about it. This ticket is generally also in conjunction with #457 which introduces rclcpp
as a dependency to rosbag2_cpp
and thus lets use rclcpp::SerializedMessage
.
I hope to get some of this in before Galactic feature freeze.
This turns out to be a non-trivial change. The main culprit I am seeing so far is that the serialized bag message as well as the topic metadata are represented in the sqlite3 data base as they are. That means, modifying these values leads to a modification in the database layout. That further means, we'd have to increase the bag-version number once more making it basically impossible to read old bag files with a current/newer rosbag version. Pretty unsatisfying ..
modifying these values leads to a modification in the database layout
Must this be true? I would think we could change the API but keep it so that the insert statements stay the same. This may make the write/read portion of storage implementation a little more complex. But, the rosbag2 generic API and the on-disk bag representation don't have to be a 1:1 match.
we'd have to increase the bag-version number once more making it basically impossible to read old bag files with a current/newer rosbag version.
Even if we accept the above, that we change database layout - I don't think this means we have to change the bag version number. My understanding of the bag version number is that it relates only to the metadata format, and that the storage implementation is responsible for everything that happens after that. In that case, if the metadata stays the same but the layout changes, would maybe the storage implementation (such as sqlite3
) need its own version identifier?
Regardless of that - rosbag2 needs to solve the "old bag problem" in some way - we will have to make changes at some point, it seems inevitable. I see two ways:
-
ros2 bag convert
which knows how to update bags from older versions to newer ones - Versioning API internal to the reader that knows, based on version number, how to read various versions
We have already done some ad-hoc implementation of the versioning API in the metadata parsing - although I'm not sure if it's the ideal way to handle it, it seems like it will become unmanageable after a few versions.
Regardless of that - rosbag2 needs to solve the "old bag problem" in some way - we will have to make changes at some point, it seems inevitable. I see two ways:
* `ros2 bag convert` which knows how to update bags from older versions to newer ones * Versioning API internal to the reader that knows, based on version number, how to read various versions
I am actually interested in this... I find the conversion of data from one format to another to be an interesting puzzle. Should I open a brainstorming issue for this?
If you have ideas which you can concentrate solely around the bag conversion topic, I'd suggest to open a dedicated ticket for it. For other general data type issues, we can continue the discussion here.
Semantically, I mostly would like to revisit the
SerializedBagMessage
. I don't think it's sensible to attach serialized data strictly to a topic. I would much rather see it attached to a type.
Just discovering the rosbag2 api, but I strongly agree with this. It would be nice to be able to check the type of the message contained in SerializedBagMessage
without having to deserialize it.
It would be nice to be able to check the type of the message contained in SerializedBagMessage without having to deserialize it.
Just want to note that you don't actually have to do this, I think it's slightly beside the point of the comment - there is a topics -> types
map in the BagMetadata
that you can use for lookups. The point is more that it makes more sense to group data together in a different way for the internal data flow.
It would be nice to be able to check the type of the message contained in SerializedBagMessage without having to deserialize it.
Just want to note that you don't actually have to do this, I think it's slightly beside the point of the comment - there is a
topics -> types
map in theBagMetadata
that you can use for lookups. The point is more that it makes more sense to group data together in a different way for the internal data flow.
Yes but that implies having access to the BagMetadata
object which you don't necessarily have when processing a rosbag2_storage::SerializedBagMessage
, whereas in ros1 you had this information accessible from a rosbag::MessageInstance
(which I see as the ros1 equivalent of rosbag2_storage::SerializedBagMessage
).
I am in the middle of ros1->ros2 code conversion which extensively uses the rosbag cpp api. I have the feeling that having the equivalent of a_rosbag_message_instance.isType<custom::my_msg>()
would have make it much simplier
- Closing this issue in favor of the other opened https://github.com/ros2/rosbag2/issues/1553.
After https://github.com/ros2/rosbag2/issues/1553 is implemented, one can match
topic_id
to thetopic_type
ortopic_name
stored in metadata. i.e. having a look-up tables for that
We have complaints from other users about the excessive memory usage of the rosbag2 recorder by storing the topic name as a string in the struct SerializedBagMessage
. Adding more fields to the struct SerializedBagMessage
will even worsen the situation with memory consumption in inner rosbag2 buffers.