design
design copied to clipboard
article about rosbags in ROS2
This PR is the place to gather feedback for the design doc about rosbags in ROS2.0
The section so far is structured in such that it provides a couple of "Alternatives" for both, the requirements for ROSbags as well as which underlying data format is going to be used. Eventually, these alternatives become part of the fixed requirements and proposed data formats when discussion comes to a consensus.
I was thinking about this recently myself, and, since you started writing this, I'll tag this on here.
I thought it might be cool to introduce a layer of abstraction such that the user can provide functions to read from their format, then ros2 bag will simply use those functions when publishing its messages.
from ros2bag.message_layer import MessageLayer
import ros2bag.spin
class WeirdFormatPlayer:
def __init__(self):
convert_layer = ros2bag.message_layer()
# override with a custom way to get the next message
convert_layer.get_next = self.get_next_message()
def get_next_message(self):
# reads the next message from our weird thing
return next_message_to_publish
# somewhere in main
ros2bag.spin()
Given that this is hideous pseudocode, this abstract silliness might help someone rephrase this to something useful (or me when I have a few minutes).
@katiewasnothere: some of these ideas may be relevant to your project.
The Building Wide Intelligence Lab at UT Austin is currently looking into data loss issues with rosbag in ROS 1.0. Our goal is to be able to maintain the original topic publishing frequency within a rosbag so that there’s little to no data loss (minimal messages dropped and minimal to no compression).
For example, using the current implementation in ROS 1.0, our empirical data shows that raw messages from a kinect camera topic in a simulated environment typically produce a publishing frequency of about 19.942 hz. However, when all topics in the simulation environment are recorded to a rosbag (some 24 ROS topics ranging in frequency from about 200 hz to 1 hz), a significant amount of messages are dropped, reducing the publishing rate when the rosbag is played back to roughly 6.375 hz. The ability to save all topic messages for a given scenario can greatly help with training learning algorithms as well as in debugging failures. It would be nice to have functionality in ROS 2.0’s future rosbag implementation to prevent such extreme data loss.
This might be a lofty goal, but I think it would be useful for auditing purposes to have a optional storage format that could support recording append-only like file system or write to disk behavior. Think of rosbag recording in autonomous cars or other commercial applications.
I'm currently making an audit framework for creating existence proofs for appending log files, thus rendering them immutable or at least cryptographically verifiable on the fly. I'm not yet sure if checkpointing log events (or in the case of rosbag: messages or chunks of messages) via SQLite is feasible (I haven't yet looked into it deeply), but doing so on the ros1 v2 bagfile binary format is not so practical as the file includes
This header structure in ros1 v2 bagfile of course speeds up the traversal of large binary data with meta information, but its implementation of random write operations makes it hard to protect growing recording files from malicious mutation or accidental corruption given that a digest of written bytes of the file cannot be linked to provide a time series proof of existence, eg compounding HMAC computations on log writes.
Perhaps this is a discussion for a specific storage plugin for immutable data stores: http://usblogs.pwc.com/emerging-technology/the-rise-of-immutable-data-stores/
but I just wanted to mention this as data storage formats are touched upon in this PR.
I think it would be useful for auditing purposes to have a optional storage format that could support recording append-only like file system or write to disk behavior.
Big vote of support for this idea.
The Building Wide Intelligence Lab at UT Austin is currently looking into data loss issues with rosbag in ROS 1.0. Our goal is to be able to maintain the original topic publishing frequency within a rosbag so that there’s little to no data loss (minimal messages dropped and minimal to no compression).
This is as much about the tool implementation as it is about the format. A lossless rosbag would need to provide timing (possibly hard real-time) and bandwidth guarantees about how much data it can save. It may need to use strategic buffering, it may need to leverage OS support for buffers, it would need to understand the storage medium and that medium's capabilities, it would probably need to make configurable decisions about what gets dropped if there is not enough storage bandwidth available, and probably a whole pile of things I haven't thought of because I'm not a storage expert.
In short, it would make a pretty interesting research project and the result would be massively useful to people like @ruffsl and the wider ROS community.
It would be worth looking at the logging tools used in the financial domain. They have to log absolutely everything exactly as it happened and have really tight timing requirements.
A notable issue with ROS1 bagging as a data recording approach is that it creates a parallel communication channel for the monitored topics, and is not always an accurate representation of data flow in the system being monitored.
In ROS1, where the rosbag record
records topics via a standard pub/sub API, two bag recorders running on the same graph may end up with a different set of data. However, most traffic in ROS1 graphs is 'reliable'. I imagine this issue could be much more common in ROS2 due to the first-class support of 'best-effort' data transmission.
It might be worthwhile to consider either implementing, or leaving the door open via a plugin-model, to something like a pcap-based ros2bag record
, where rosbag data is collected directly from the wire (or from an existing packet capture).
ROS1 Notes
Here are some of my notes from after we logged over a petabyte of data with ROS1 rosbag
in production with multiple GigE Vision cameras, 3D/2D LIDARS, GPS, etc.
Multithreaded compression
We were able to record uncompressed bags at speeds up to about 6Gb/s (sensor limited) with ROS1. Storage was to a striped pair of SATA3 4TB drives.
Compression issues caused silent failures as rosbag
was limited to a single thread per bag and it was unable to provide sufficient throughput. With the unlimited queue size (--buffsize
) configured with rosbag
, and rosbag
grew to fill system memory. The OOM killer then terminated rosbag
processes when the single compression thread was unable to keep up, the alternative was dropping messages.
Better defaults / errors
I'm not sure the current defaults are the best defaults for most users, much less novice users.
It might be worth having a rosbag benchmark
to determine optimal --chunksize
based on actual publishing rates / compression performance.
rosbag
should throw warnings when the message queue (--buffsize
) is full
rosbag
needs to log errors before the OOM Killer strikes.
Split/concatenate bags
We somewhat arbitrarily chose --split
to be 2GB to avoid running into trouble. This worked well when we were post-processing data in the cloud, but it would have been nice to be able to concatenate multiple bags into a single bag for smoother playback in RViz.
I'd also argue that I should be able to split an existing bag and have metadata copied to each new bag.
When playing a sequence of bags there was some latency as the current bag was closed and the next one opened, depending on compression and system load this can cause timing issues. Being able to concatenate bags would help with this.
Spatial Indexing
For a roughly 12 hour operational period we could record up to 4TB of data split into 2GB bags (~20 seconds per bag). To locate which bag held data associate with a given GPS position we recorded a separate unsplit low frequency bag to index GPS position information that spanned the entire period. Our cloud processing system ingested the GPS index bag and would then process bags with the full sensor data as needed.
Caching
We used rosbag info
to validate data uploads to the cloud. However it can take considerable time to run when processing vast quantities of data. I think it is worth considering caching the output in a metadata record to help with data validation.
Param/TF Recording
Recording and playback of TFs never quite works as expected. The main rosbag issue we had is that if a static TF was published at 1Hz and we split bags the first second of data might be discarded as the associated TF was in the previous bag.
One issue commonly encountered during development was trying to debug a node that published a TF. Bag playback would broadcast the same TF occasionally causing confusion with developers. While it can be done with rosbag filter
, I'd like to propose adding something like --no-tf
to rosbag play
. to avoid duplication.
From a usability standpoint, maybe it makes sense to store static TF data separately at the beginning of each bag and enable rosbag play
to automatically launch a static TF publisher via plugins.
Service calls / Parameter Server
Is it reasonable to publish service call information in a manner similar to diagnostic aggregator for recording/playback? Does this provide a reasonable way of recording parameter updates?
ROS1 Reliability
It is my experience that rosbag
is extremely reliable, when everything else is working. I do not believe we found any real bugs in rosbag
. If asked to quantify the reliability I would validate rosbag
against something like pcap or pf_ring from actual hardware data to isolate network issues.
During development many of our issues turned out to be network related. Collisions caused retransmissions which consumed bandwidth, which caused collisions, etc. The worst of which was caused by 8 GigE Vision cameras that were hardware synchronized and crashed the network every 30th of a second while iftop claimed bandwidth to spare due to the sampling window. Fixing this, it was found that many quad-port gigabit Ethernet cards actually have an internal switch instead of separate independent physical interfaces and can not sustain more than 1 Gb/s.
I also recall we had an issue similar to what @paulbovbel commented on where the publishing rate measured by diagnostic aggregators did not match the logged frequency.
it is also worth noting that not all hardware drivers have the same sense of timestamps, and when an image is retrieved from hardware may not be the time it was captured, especially when capture is hardware triggered. This caused issues that we initially blamed on rosbag
.
Article Feedback
In general this looks like a good start
SQLite
I think SQLite storage is a reasonable option, however I'm unconvinced that it should be the default storage format. SQLite optimizes for select performance not insert performance
- In my experience with
rosbag
, onboard write performance during operation is generally more resource constrained than playback during debugging on a development workstation - System resources used by bagging are not available for other tasks, many embedded-ish systems (Atom, ARM) have memory & i/o bandwidth constraints
- SQLite b-trees do not take advantage of the temporal adjacency of message recording and playback, whereas with Bag Format 2.0 the messages are recorded next to each other on disk
- Offline conversion from Bag Format 2.0 to Postgres has previously worked well for me and improving CLI conversion/export tools for SQLite/Postgres could provide a solution for random access use cases
rosbag export example.bag sqlite://example.db3
rosbag export example.bag postgresql://ros@localhost/control
- My opinion is that the defaults should optimize for production instead of simple demos. IMHO high datarate debugging falls into the 80% (of my personal use cases, obvs)
HDF5
:unamused: :-1: Edit: I'll add more on this soon.
ROS1
"No random access" While random access is not available by default, using a seek operation to access the correct chunk isn't that inefficient/difficult and usually most applications also require data that is adjacent anyway. It may be that some developers are more comfortable with having random access to data, even if they end up randomly accessing sequential data sequentially.
From the Gazeebo comments
"Rosbag 2.0 format is analogous to a singly-linked list and requires reading from the beginning of the file."
I would like to note that it is not a singly-linked list of messages, but a singly linked list of chunks that have a default size of 768KB. This makes it relatively efficient to seek to the next index record to find a particular time stamp in a bag. It looks like there would be an upper bound of 2604 seek operations to locate a timestamp. One way to improve performance may be to store a meta index record with offset position of all previous index records at the end of the bag to make it easier to implement a binary search of timestamp positions. This meta-index could be generated offline via rosbag reindex
to simplify implementation. A fixed size metadata record at the beginning could contain the position of the metaindex. This should bring things down to something on the order of 9 seek operations (for a 2GB bag file) to find the chunk associated with a given timestamp.
SELECT messages.message FROM messages JOIN topics ON topics.id = messages.topic_id WHERE time_recv_utc > 12345678 AND time_recv_utc < 23456789 AND topics.name LIKE "/some/topic/name";
As far as I can tell, this would be more efficient with format 2.0 + metaindex than with SQL.
SELECT topics.name, message_types.name FROM topics JOIN message_types ON topics.message_type_id = message_types.id;
This functionality could be efficient with format 2.0 by caching the output of rosbag info
.
Alternate storage container
While I like the idea of full support for storing messages in a relational database, I believe the default should be a stream oriented append only format 3.0.
Even if SQLite is the default, I would prefer ROS2 rosbag
use a directory as the base storage container. This will provide a place to store bagging configuration information, signatures, hashes and metadata.
rosbag2 play yetanothertest_2018-01-01-00-00-01
yetanothertest_2018-01-01-00-00-01/rosbag.config
yetanothertest_2018-01-01-00-00-01/metadata.yaml
yetanothertest_2018-01-01-00-00-01/wind_speed.rrd
yetanothertest_2018-01-01-00-00-01/gps.db3
yetanothertest_2018-01-01-00-00-01/imu.db3
yetanothertest_2018-01-01-00-00-01/left_camera.bag
yetanothertest_2018-01-01-00-00-01/right_camera.bag
yetanothertest_2018-01-01-00-00-01/control.bag
yetanothertest_2018-01-01-00-00-01/control.bag.sha1
yetanothertest_2018-01-01-00-00-01/control.bag.sign
This optimizes for implementation simplicity and multithreaded performance but requires additional files (1 Petabyte of 2 GB Bag Format 2.0 already requires 2M files and things tend to get odd after 32k files in a single directory)
One option for reducing the number of files might be to use a loopback image mount
Features
Format 3.0
Given that it was able to push 6Gb/s, I'm a proponent of the ROS bag format 2.0 and would like to see it updated to support ROS2 natively instead of being bridged. Maybe CDR, protobuf, etc can be implemented as a record types in format 3.0
Metadata
There have been several use cases where projects have needed some sort of metadata storage
Previously, we used markdown files to make sure field notes on hardware changes and operational information (client name, test/production, experiment #, etc) was passed from field to the cloud.
On another project, due to limited engineering resources available at the time, we published camera serial numbers to separate topics to track hardware changes. This worked well enough, but it required scanning the bag to grab the serial numbers or discarding messages published before the serial numbers were published. I think the current "standard" for this is to output serial numbers via a diagnostic aggregator, which isn't much better. Reimplementing this, I would have preferred storing the serial numbers in metadata so it is stored once at the beginning.
For cloud uploads we needed some metadata to help validate that the data sent was received. In this case we wrote the output of rosbag info
to a file, uploaded the data and then ran rosbag info
in the cloud to check that they matched. It would have been nice to cache this at the beginning of the same file
./Storage/2017-01-25-20-03-14.bag
./Storage/2017-01-25-20-03-14.md
./Storage/2017-01-25-20-03-14.info
./Storage/2017-01-25-20-03-14.sha1
Fixed size time series data
I think it is worth considering how to support something like RRDTool for data that decreases in resolution over time.
From a computer vision perspective lossy compression can be problematic, however something like RRDTool that supports dropping frames but keeping each frame uncompressed for older data may be useful for dash cam and blackbox applications. 30fps for the previous hour, 10fps for the previous day, 1 fps for the previous week, etc.
@gbiggs I'm not sure about performance, but the Java makes this look like the financial industry solution https://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy https://www.datastax.com/customers
@LucidOne: your comments appear very rational and to be based on experience with the current (ROS 1) rosbag
in production environments that (as you phrase it yourself) "are not simple demos".
Quite some people (in datascience) seem to at least imply that HDF5 is a format that is well suited to store (very) large datasets coming in at high datarates, is able to deal with partial data, supports hierarchical access and has some other characteristics that make it seem like it would at least deserve consideration as a potential rosbag format 3 storage format.
Your comment on HDF5 was:
:unamused: :-1:
In contrast to your other comments, this one seemed to lack some rationale :)
Could you perhaps expand a little on why you feel that HDF5 would not be up to the task (or at least, that is what I believe these two emoticons are intended to convey)?
HDF5
Unlike SQLite, I have not actually used HDF5 so this has a bit more gut feelings and a bit less hands on experience, however I have put the word out to one of my colleagues who does use it to get more information about their experiences and will add that if/when they respond.
First, I would like to say that in theory HDF5 might a great choice, in practice I have some concerns. The blog post in the article and the HN comments cover a lot.
Complexity
It seems to me that HDF5 is a complex Domain Specific Format, one like GeoTIFF that just happens to support multiple domains. I personally don't have the fondest memories of GeoTIFF or even TIFF.
"When TIFF was introduced, its extensibility provoked compatibility problems." -Wikipedia "Like the TIFF standard itself, GeoTIFF is conceptually simple, but the exact specification is complex and technical." -USGS "The algorithms and data structures stored in an HDF5 file can be complex and difficult to understand well enough to parse correctly." -HDF Group
Storage
HDF5 that supports both contiguous and chunked data with a B-tree indexes. Which it pretty much ideal for what we want, however the single base C implementation means everything has the same bugs.
I understand how this happens, but HDF5's extensible compression seems excessive to me.
As far as I can tell it uses ASCII by default.
Concurrent Read Access is supported but thread safety may be a work in progress.
Development process
According to this HDF Group has switched to Git and will accept patches if you sign a Contributor agreement so they can close source the patch for enterprise customers. :real talk:
Tests have been written but I am unable to find where Continuous Integration for HDF5 is hosted.
The website needs improvement/refactoring with multiple copies of the Specification, that may or may not be the same.
- https://portal.hdfgroup.org/display/HDF5/File+Format+Specification Last Modified: November 02, 2017 | 07:30 AM
- https://support.hdfgroup.org/HDF5/doc/H5.format.html Last modified: 25 April 2016
The Git repo exists, but I was unable to find an issue tracker in the release notes, in 2018. It looks like the HDF issue tracker may only be accessible to members of the HDF Group. :confused:
Summary
I think HDF5 might be a reasonable in theory and I agree with much of the reasoning behind switching to a 3rd party file format maintained by a dedicated group, but in practice, my gut feeling is that a lot of time will be spent working through the complexity and cultural differences.
On the other hand, the HDF Group just announced "Enterprise" support, and hired a new Director of Engineering 6 months ago, so if their website improved, development processes modernize, benchmarks were reasonable, and engineering resource were available to deal with the complexity I could be talked into it.
This was suggested to me as a possible option.
https://arrow.apache.org/ https://arrow.apache.org/docs/python/ipc.html https://en.wikipedia.org/wiki/Column-oriented_DBMS
Many, many years ago myself and Ingo Lütkebohle (@iluetkeb in case Github can ping him) did some work on improving the rosbag format. As I recall, Ingo and his group moved in the direction of an improved version of the rosbag format as it was at that time; I don't recall their results but I do remember that they changed stuff.
For myself, I spent quite a bit of time looking at using a modified version of the Matroska media container as a rosbag format and also as a format for recording point clouds. The reasoning behind this is that 1) rosbag is recording streams of time-indexed data, which media containers are explicitly designed to hold, and 2) Matroska uses an extremely flexible format called EBML (embedded binary markup language; think of it as being XML but with binary tags and for binary data). The format that resulted is specified here. I also had one prototype implementation and a half-complete redo based on what I learned, but given that part of that work's purpose was to teach me C++11 it's a bit of a mess in places. The work never went anywhere, unfortunately, so I don't know how performant the format is for recording.
The reason I'm mentioning this is that the modified format I and Ingo came up with, the Matroska format, and media containers such as MP4 in general support most of the feartures @LucidOne is asking for. In no particular order:
- Metadata can be stored anywhere in the file and is instantly locatable.
- The format provides a time-based index into the data at any resolution desired (even individual "frames" if you're willing to have a massive index). The index is stored at a place in the file recorded in the metadata at the start of the file so it can be quickly located. The index can easily be overwritten afterwards if it needs to be changed. Adding one after recording (rather than building it during recording) is also simple.
- How big a chunk is used for data streams can be changed even while recording and on a stream-by-stream basis.
- "Attachments" can be added to the file, allowing things like hardware serial numbers to be included, viewed, and even added or edited later if necessary without needing to have a dedicated data stream to store them.
- Native support for segmenting a set of data into multiple files, with the metadata duplicated in each segment or not as desired and segment ordering defined in the files or controlled at playback as desired. Segmenting is commonly used by media players (the DVD and blu-ray formats are built on it) and they must do it with zero latency so if playing back multiple files without gaps is important to you, look into how media players achieve gapless playback (hint: playback buffers).
- Data could be recorded as one stream per file, then a single file (or set of files covering a single set of data) can be built by muxing those streams together in the desired structure later on, without having to actually process the data itself - simply copy the segments into a single file and split them based on time.
- Has optimisations in the format to reduce metadata, enabling recording with minimal overhead (when combined with buffering) and reduced disk space.
- Files can be recovered so long as you have the SeekHead element, and you can probably recover the file even if you don't. Reorganising a file is also possible if you have the disc space, so this enables recording data as it comes in then reorganising it afterwards for efficient playback or querying or to put each data stream's data all together or whatever. There are places in the format where information can be optionally included to make recovery from corrupted streams easier and more robust.
- Supports chapters, so you could make chapters with periodic GPS coordinates for the title so you can quickly find the place in your 12 hours of data that corresponds to a particular position.
- Supports tagging, which is good for cloud storage services.
- Fixed parameters could be done using attachments. I think it would be easy to add an additional element used for static topics, or add a flag to the streams so you can say to the player "this stream is static, it only has one frame, publish that and latch it".
Like I said, I never got as far as testing performance. My gut feeling is that you would need to record a file with the bare minimum of metadata, then reprocess it afterwards to add additional metadata like the cue index. However this would not be difficult and the format enables much of it to be done in-place, particularly if you are planning for this when the recording is made (which a tool can do automatically). I do think that the format's flexibility means it would be relatively easy to optimise how data is recorded to achieve high performance recording.
Ultimately, the work stopped due to other committments coming down from on high, and a lack of motivation due to it being unlikely to be adopted by rosbag. If the ROS 2 version of rosbag follows through on the goal of allowing multiple storage formats, then I might be interested in picking it up again, perhaps as a tool to teach me another new language. 😄
I think that media formats should be investigated for their usefulness in the ROS 2 rosbag. If recording performance meets the needs of high-load systems, then the features of these formats are likely to be very useful.
In the same work I also look at HDF5, at the urging of Herman Bruyninckx. I only vagule recall my results, but I think that my impression was that HDF5 was a bit over-structured for recording ROS data and would need work done to generate HDF5-equivalent data structures for the ROS messages so that the data could be stored natively in HDF5, in order to leverage the format properly. It's not really designed for storing binary blobs, which is what rosbag does.
I emailed back and forth with a few earth science people who regularly work with HDF5. Here are a summary of my notes.
HDF5 Notes Continued
-
They disagree with me that HDF5 is domain specific, as all of the domain specific bits are at the naming and layout. I still think it may be difficult to build generic tools without a pre-determined layout or naming but perhaps there is enough introspection in the API to make it work.
-
They explained that much of the complexity in HDF5 is for data (de)serialization, endian conversion, etc. If the plan was to store ROS messages as CDR in HDF5, then I don't see how the complexity is worth it. Does storing message elements as native types in HDF5 require an extra pair of deserialize/serialize operations? As an example, to store pointclouds in HDF5 and maximize usability with existing tools, should we store them in LAS in HDF5 or as "Sensor Independent Point Cloud" which are apparently (Can not find link to standard) is a standard pointcloud HDF5 layout.
-
Do we need better endianess support, has anyone ran into problems in practice? This is one example of how ROS handles endianess issues.
-
Multi-write can be done with multiple files and a post merge operation, if I understand correctly.
-
It is worth noting that everyone I communicated with who used HDF5 liked it and are using it in professional environments for important projects.
-
I have been assured that The HDF Group is working on making the issue tracker publicly available and they are probably running CI internally.
Links
http://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/11_Datatypes.html https://www.hdfgroup.org/2015/09/python-hdf5-a-vision/ http://www.opengeospatial.org/projects/groups/pointclouddwg http://docs.opengeospatial.org/per/16-034.html https://wiki.osgeo.org/wiki/LIDAR_Format_Letter
Thanks for all the feedback so far. I've found it very interesting and useful. We're still reading and considering.
Another thing to consider (because there were not enough):
http://asdf-standard.readthedocs.io/en/latest/
That's the replacement for FITS (a format used to store astronomical data) and ASDF will be used with the James Webb Space Telescope project.
@Karsten1987 thanks for this PR. I will just add couple of requirements for large data logging in automotive industry, in particular for the self-driving part of it.
Most of the data throughput comes from the sensors. A typical sensor setup consists of:
- 5 lidar sensors (1 on the roof, 1 front, 1 rear, 1 next to each side mirror) => 4x4MB/s + 1x16MB/s = 32MB/s
- 6 cameras (3 under the windshield, 2 in each side mirror, one rear looking) => 6x90MB/s = 540MB/s
- 5 radars (1 long-range in the front, 4 mid-range mounted on every corner of the car) => KB/s
- 1 gps => KB/s
- 1 imu => KB/s
- 12 ultra sound sensors => KB/s
So we are talking about the data throughput under 1GB/s in total for a fully self-driving car.
In the development phase we normally want to record all of this data for debugging purposes. Recording in our case normally happens on a NVME drive or a system like the one from Quantum: https://autonomoustuff.com/product/quantum-storage-solution-kits/. In the development phase we do not like to compress the data since this binds additional resources, and we like to at least know if the data was lost before it was flashed into the drive. We would also like to have services be recorded as well.
In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive. For the part of the rosbag that writes the data into the memory it would be important that it is designed as realtime as for instance introduces in this article: https://design.ros2.org/articles/realtime_background.html. So no memory allocation during runtime, no blocking sync primitives and no already above mentioned disk IO operations. There should be absolutely no message losses in this case.
In terms of using ROS 1 rosbag tool, our experience is very, very similar to this one so I won't repeat.
Thank you all for being patient and giving feedback on this. We finally picked up the development of rosbags and therefore I am putting this officially in review.
TL;DR Given the feedback we decided that there is no single best storage format, which suits all needs and use-cases. We therefore focus on a flexible plugin API, which supports multiple storage formats. We start off with SQLite as our default storage format because of its simplicity and community support. We simultaneously will work on a plugin which reads legacy ros1 bags. The idea is that the API is powerful enough to easily provide further storage formats, which might be faster than SQLite and can be provided by the ROS community.
We will start working on rosbags according to this design doc, but we are happy to incorporate any further feedback.
Folks, I am hoping this is the right place to make a suggestion about rosbag record. We have been working with high frame rate image capture and have come to the realization that the standard ROS 1.0 subscription model which uses serialization / deserialization over the network interface might not be the most efficient. For lot of imaging / lidar type sensors, ROS community came up with a clever idea of using nodelets to avoid that step. However, to my knowledge rosbag record was not able to take advantage of this shared pointer to disk technique. My understanding is that ROS 2.0 is supposed to natively and transparently support shared pointers. Accordingly, I'd suggest that the rosbag record feature also use the shared pointers to improve throughput, latency and reduce computational overhead. Thanks for listening.
@vik748 wrote:
For lot of imaging / lidar type sensors, ROS community came up with a clever idea of using nodelets to avoid that step. However, to my knowledge rosbag record was not able to take advantage of this shared pointer to disk technique.
off-topic, but see https://github.com/ros/ros_comm/issues/103.
Has there been any consideration how to handle the feature of "migration rules" from ROS 1?
@vik748
Folks, I am hoping this is the right place to make a suggestion about rosbag record. We have been working with high frame rate image capture and have come to the realization that the standard ROS 1.0 subscription model which uses serialization / deserialization over the network interface might not be the most efficient. For lot of imaging / lidar type sensors, ROS community came up with a clever idea of using nodelets to avoid that step. However, to my knowledge rosbag record was not able to take advantage of this shared pointer to disk technique.
So, rosbag requires that the data needs to be serialized in order to write it, because our messages are not POD (plain old data) and therefore they are not laid out consecutively in memory. Serialization solves this by packing all data in the message into a single contiguous buffer. So you won't avoid serialization for writing and deserialization for reading bag files.
In ROS 1 there's already a nodelet for rosbag (https://github.com/osrf/nodelet_rosbag) and the same advantage can already be gained in ROS 2, but in both cases the data must be serialized before writing to the disk even if you avoid it during transport.
Has there been any consideration how to handle the feature of "migration rules" from ROS 1?
@dirk-thomas, we don't have concrete details on how to implement these rules as for now. But the current design considers versioning and thus has room for migration rules in their respective convert functions. The idea here is that when converting ros messages from one serialization format to another one, additional conversion policies/rules can be passed to these functions. One of these rules can then be migration. Does that make sense?
It looks like rosbag2 development is in full swing, so a question about something that made me curious in the action design PR (https://github.com/ros2/design/pull/193#issuecomment-430025771) - will rosbag2 bag services as well as topics?
@paulbovbel, currently we can only grab topics in a serialized way and thus bag nicely. That is not to say that it's impossible to get service callbacks in a similar serialized way. It's just not yet implemented.
However, I am unsure at this point whether we can guarantee that service requests can always be listened to by rosbag (without responding). Also, when doing so I have to look into how to fetch the service responses.
Even if services can't be played back, there is value for debugging tools and inspection tools in at least logging them (assuming the response can also be captured).
btw, I mentioned this @Karsten1987 a while ago: If you choose lttng as a tracer, instrumenting services or any other points you want to get data from would be trivial. It's a whole different set of tools, independent of the middleware -- which could be both a pro and a con -- so not a decision to take lightly.
My takeway is, that with the current implementation using DDS-RPC for service implementation, I imagine services can't be bagged via rmw transport mechanisms, correct? This means that services and actions (as currently designed) are not going to make it into ROS2 bags.
Not being able to bag services was a huge limitation in ROS1 (if bagging was your primary approach for grabbing debug/telemetry/etc. data, as I believe it is for many users), and a common reason for 'just using actions'. Now that will not be an option either.
Is anything in the first paragraph up for reconsideration in the near future? I recall @wjwwood mentioning implementing services-over-DDS-topics was considered, rather than using DDS-RPC, which sounds like it would be possible given the introduction of keyed topics.
I am not sure if we could generalize that all services are necessarily implemented via topics. That might hold for DDS implementations, but not necessarily for other middlewares.
Similarly to topics, rosbag could open a client/server for each service available on startup. This would depend though on whether the rosbag client would receive the answer from the original service server.
The first step however is to be able to receive service data in binary form, which the rmw interface doesn't currently allow.
In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive.
@dejanpan Do you know if this is going to be available in ROS2? Or is Apex working on a solution to this? Do you have any pointers on how one can store recordings, in ROS2, to memory?
I am interested, precisely, in this or anything close "In the production stage (that is a self-driving car as a product) we will also record all the data but only over a short time period. Probably about a minute of it. In this setting all the data should be recorded in-memory into the ring buffer that rolls over every minute. A separate application running in a non-safety critical part of the system (separate recording device, process in a different hypervisor) should then fetch this data and store it into a drive.". Thank you.