gz-sim icon indicating copy to clipboard operation
gz-sim copied to clipboard

Data storage classes

Open chapulina opened this issue 3 years ago • 23 comments

One of the challenges with Ignition's modular architecture is to create data structures and convert between them across all libraries.

Types everywhere

For example, Ignition Math is used by pretty much all libraries, from SDFormat to Rendering and Physics, to store math data types like vectors. But that's not the only place where those types exist. For most Ignition Math types, we also have equivalent Ignition Msgs types, so that the data can be transmitted over the wire.

Sure, that's not a terrible situation, and each class has a different use case. But let's look at something more complex, like a 3D mesh.

We start from sdf::Mesh which stores some basic metadata like file path and scale. That's pretty much the same as what ignition::msgs::MeshGeom holds. But the actual 3D data, like vertices and normals, is held by ignition::common::Mesh, which is used by both Physics and Rendering. But by itself, it doesn't include all the data that those libraries need. Ignition Rendering has a wrapper that adds extra information: ignition::rendering::MeshDescriptor. And Ignition Physics needs more information to be passed alongside common::Mesh to its ignition::physics::mesh::AttachMeshFeature.

While implementing heightmaps, I created sdf::Heightmap (https://github.com/osrf/sdformat/pull/388) to hold metadata, which is similar to ignition::msgs::HeightmapGeom. Made use of common::HeightmapData to store the actual height data, but that doesn't hold any material information. So for Ignition Rendering, I created the ignition::rendering::HeightmapDescriptor class (https://github.com/ignitionrobotics/ign-rendering/pull/180), of which 80% is a direct copy from sdf::Heightmap, with an extra field for common::HeightmapData. The physics implementation will probably need to combine the SDF and Common types too.

With so many equivalent types, we also need lots of conversion functions between them:

Implications

All these different types affect:

  • Performance: time converting between types
  • Maintenance: adding a field to a type requires touching lots of libraries - they can easily get out of sync

Moreover, there's just too much boilerplate involved. The classes on SDF and Rendering for example use the PIMPL pattern to be extensible, so they need setters and getters for each property, besides the rule of 5 (until we have something like https://github.com/ignitionrobotics/ign-cmake/pull/123). This is all created manually, which causes inconsistencies. All this just to hold some data.

Moving forward

I'm opening this issue to gather some ideas. I think that the situation we have with the math types is pretty good. There are 2 classes to store data, one for operations (ignition::math) and one for transmission (ignition::msgs). How could we achieve an equivalent situation for the other types?

  • SDF types are not supposed to be performing operations, and there are even questions of whether they should be read-only (https://github.com/osrf/sdformat/issues/292). So I think they're bad candidates to be the "operation" data type.
  • Common types are going in a good direction, but:
    • They need to be extended to hold more metadata
    • Some aren't PIMPLized
    • Some aren't copiable / movable
    • Most importantly: they're bundled in a library that brings a lot of dependencies unrelated to those types
  • Ignition messages are the most compact way of defining data, but they can't be extended without breaking ABI (https://github.com/ignitionrobotics/ign-msgs/issues/113). It would also be tough to add operations to them.

chapulina avatar Dec 17 '20 04:12 chapulina

This reminds me of some of the discussions in ros2 about serialization and native data types. Is there any understanding we can draw from that experience?

scpeters avatar Dec 21 '20 19:12 scpeters

Let's start by removing the usage of SDF types inside components in favor of Ignition Messages.

This should help with the scripting effort too, so we can postpone adding scripting interfaces to SDFormat classes.

chapulina avatar May 03 '21 17:05 chapulina

Let's start by removing the usage of SDF types inside components in favor of Ignition Messages.

I don't think that's what we want to do. Generally, I think we should avoid making protobufs part of any of our public APIs if it can be avoided.

I would like to see a common struct as our actual component with appropriate conversion functions to both sdf and ignition::msgs.

mjcarroll avatar May 03 '21 17:05 mjcarroll

we should avoid making protobufs part of any of our public APIs if it can be avoided.

That's because of https://github.com/ignitionrobotics/ign-msgs/issues/113 (also pointed out above), right? Yeah, that's a good point.

I would like to see a common struct as our actual component with appropriate conversion functions to both sdf and ignition::msgs.

My main concern with this is that adding another type feels counterproductive when the goal is to reduce the number of types we have.


I'm more inclined to figuring out a way in which we can make protobuf work for us instead putting a lot of effort into designing yet-another-collection-of-types.

chapulina avatar May 03 '21 17:05 chapulina

I think based on the recommended guidance from the Protobuf C++ Tutorial, we're going to end up with another collection of objects one way or another.

Protocol Buffers and Object Oriented Design Protocol buffer classes are basically dumb data holders (like structs in C); they don't make good first class citizens in an object model. If you want to add richer behavior to a generated class, the best way to do this is to wrap the generated protocol buffer class in an application-specific class. Wrapping protocol buffers is also a good idea if you don't have control over the design of the .proto file (if, say, you're reusing one from another project). In that case, you can use the wrapper class to craft an interface better suited to the unique environment of your application: hiding some data and methods, exposing convenience functions, etc. You should never add behaviour to the generated classes by inheriting from them. This will break internal mechanisms and is not good object-oriented practice anyway.

This makes me think that the ign-msgs types themselves should be some wrapper around the protobuf types, rather than the protobuf types themselves.

mjcarroll avatar May 03 '21 17:05 mjcarroll

Another issue is that ign-msgs that contain other ign-msg types are heap allocated, and so defeat the purpose storing components in contiguous memory in the ECS.

azeey avatar Sep 22 '21 00:09 azeey

Another issue is that ign-msgs that contain other ign-msg types are heap allocated, and so defeat the purpose storing components in contiguous memory in the ECS.

This shouldn't be an issue from Fortress anymore, because we're not trying to allocate components contiguously in memory anymore since #856.

chapulina avatar Dec 15 '21 23:12 chapulina

It would be nice to revisit the topic of storing components in contiguous memory, that was one of the big motivations for moving to ECS. If using messages would block us from achieving that goal, then I'd suggest against their use in components.

Based on my reading of this thread, the reason to use protobuf in components is serialization performance. The reason to use SDF is to avoid ABI issues, follow guidance from the protobuf developers, and leave open the door for improving our cache performance.

It seems that SDF is the safer option. Open questions I have are:

  1. Is SDF serialization performance significant?
  2. Will the serialization issue be resolved for the majority of users by the single process GUI+server work?

nkoenig avatar May 05 '22 18:05 nkoenig

Is SDF serialization performance significant?

If it's being serialized to XML and back, I would say that the performance is not great.

Will the serialization issue be resolved for the majority of users by the single process GUI+server work?

As we still intend to support both use cases, it's hard to predict which portion of users will be impacted. People who are doing analysis or simulation on a single computer probably won't incur a ton of serialization, but any interaction with a remote or cloud-based simulation instance would have issues.

mjcarroll avatar May 05 '22 20:05 mjcarroll

The reason to use SDF is to avoid ABI issues, follow guidance from the protobuf developers, and leave open the door for improving our cache performance.

Unfortunately, most of the SDF DOM objects also make use of the PIMPL pattern which likely will block any sort of cache performance improvement. The best strategy would be to come up with another very basic (read: c-struct) representation of the data that could be either serialized to SDF DOM objects or protobuf messages. This would allow the ECS to operate on the most efficient in-memory version of the data.

mjcarroll avatar May 05 '22 20:05 mjcarroll

Another issue is that ign-msgs that contain other ign-msg types are heap allocated, and so defeat the purpose storing components in contiguous memory in the ECS.

I know I brought this up, but as of https://github.com/ignitionrobotics/ign-gazebo/pull/856, we don't store components in contiguous storage, so this may not be relevant anymore.

azeey avatar May 05 '22 20:05 azeey

It would be nice to revisit the topic of storing components in contiguous memory, that was one of the big motivations for moving to ECS.

I know it sounds backwards not to use contiguous storage anymore, but as detailed on #856, that had been hurting performance, rather than helping it.

It seems that SDF is the safer option.

Most SDF classes were originally designed to be "read-only" and part of a tree. As documented in most of them: Typical usage of the SDF DOM is through the Root object. Some of this may be changing, and there's an ongoing effort to document this here. But ultimately, it doesn't look like SDF classes were designed to be used as independent components that store plain data efficiently.

follow guidance from the protobuf developers

I believe the guidance is that we shouldn't be packaging the generated C++ code. It may be possible to solve this while still using them in components, see https://github.com/ignitionrobotics/ign-msgs/issues/113.

chapulina avatar May 05 '22 22:05 chapulina

I know it sounds backwards not to use contiguous storage anymore, but as detailed on https://github.com/gazebosim/gz-sim/pull/856, that had been hurting performance, rather than helping it.

I think you could argue that we were never really using contiguous storage correctly because many of our components were using the PIMPL pattern as well. The speedups in contiguous storage would come from using POD types as components.

it doesn't look like SDF classes were designed to be used as independent components that store plain data efficiently.

I agree with this as well. It would be interesting to experiment with POD components at some point, but I don't think it's a high priority at the moment.

I believe the guidance is that we shouldn't be packaging the generated C++ code. It may be possible to solve this while still using them in components, see https://github.com/gazebosim/gz-msgs/issues/113.

This is my interpretation as well.

mjcarroll avatar May 06 '22 12:05 mjcarroll

Some thoughts on using simple POD / structs:

  • More concretely, we're talking about using these types to describe things like
    • sensors
    • geometry
    • collision
    • light
    • joints
    • etc
  • They'd have the same ABI issues as gz-msgs, because they aren't PIMPLized
  • They'd only valuable if they help us reduce the number of types we handle.
    • That means they'd need to be as low level as gz-math types, and as such, be used by SDFormat, gz-common, etc.
    • At that point, they would be substituting the bulk of the SDF DOM classes

chapulina avatar May 27 '22 02:05 chapulina

➡️ Proposal

I've allowed myself to dream a little and think about what would be the ultimate solution to this situation if we had no baggage or backwards compatibility to maintain. The goals are:

  • To reduce the number of data structures we need to change when we add a parameter.
  • To make these changes in an ABI-compatible way

It's all SDF

Our API across the stack has always been grounded on the SDF spec. We have model, link, collision, visual, etc, entities and components because these are SDF concepts. Our messages mimic SDF types, our sensors, physics and rendering support parameters exposed through SDF.

Change only one place

The canonical SDF description is the XML spec. So here's an idea:

  1. Automatically generate protobuf messages from the SDF XML spec.

  2. Distribute those raw proto files (not the compiled C++ classes, as suggested in https://github.com/gazebosim/gz-msgs/issues/113)

  3. Sensors, Physics, Rendering and Sim compile those proto files during their own compilations and use the protobuf C++ APIs. I'll refer to them as gz::proto to differentiate from the gz::msgs that we currently have.

  4. All SDF DOM classes are replaced by these gz::proto classes for all set/get operations. All other SDF functionality is moved to free functions that take in gz::proto, using the Interface Principle. For example:

    sdf::load(const sdf::ElementPtr &_sdf, gz::proto::World &_worldMsg);
    sdf::toElement(const gz::proto::Link &_linkMsg)      
    
  5. All officially distributed components use:

    • std types, if not possible
    • gz::math types, if not possible
    • gz::proto types, if not possible
    • a custom struct

chapulina avatar May 27 '22 02:05 chapulina

I believe the guidance is that we shouldn't be packaging the generated C++ code. It may be possible to solve this while still using them in components, see https://github.com/gazebosim/gz-msgs/issues/113.

If we use gz-msgs as components, wouldn't that mean the generated C++ code is now part of our public API? If the goal is to move away from shipping generated protobuf code, I don't think we would be going in the right direction if we use them as components.

Consider a user application that generates it's own gz-msg based C++ code. Even though this application is using the same proto files, there is no guarantee that the generated classes would be ABI compatible with the generated code used by gz-sim (they could be using completely different versions of protoc). Also, when linking against gz-sim to get access to EntityComponentManager, it would probably have link time errors because there would be multiple definitions of the gz-msg generated classes.

azeey avatar May 27 '22 02:05 azeey

If we use gz-msgs as components, wouldn't that mean the generated C++ code is now part of our public API? If the goal is to move away from shipping generated protobuf code, I don't think we would be going in the right direction if we use them as components.

I agree, it seems like a step in the wrong direction. Fundamentally, protobuf is a serialization format, it's generally going to be sub-optimal to use for anything other than serialization.

I know there is hesitance to add another set of structures, but I think about it the way that ROS 2 works. You have a set of structures that the user interacts with (the ROS idl) structs, and you have a set of structures that are used for serialization/deserialization (in some RMW implementations). These two things are generated from the same set of rosidl files, as well as all of the code to migrate between the two.

I think it would make the most sense to have a POD representation and a protobuf generation and conversion code, all generated from the SDF spec. The POD objects could be stored as components, but when you went to publish with ign-transport, it would JIT convert to the protobuf object, serialize, and publish. Additionally, you could have generated code for sdfdom to these POD structs.

This is also probably closer to capnproto's design philosophy.

mjcarroll avatar May 27 '22 14:05 mjcarroll

there is no guarantee that the generated classes would be ABI compatible with the generated code used by gz-sim (they could be using completely different versions of protoc)

Maybe this can be worked around by enforcing a protoc + gz-proto version combination?

when linking against gz-sim to get access to EntityComponentManager, it would probably have link time errors because there would be multiple definitions of the gz-msg generated classes.

Yeah good point, this may be a deal breaker.

have a POD representation

The problem with POD types is that they can't be extended without breaking ABI. On the flip side, they're better to store in contiguous memory in the ECM. I lean in favor of prioritizing extensibility without breaking ABI rather than in-memory optimization, because the former is something we're actively dealing with all the time, while the latter has a potential theoretical benefit.

have a POD representation and a protobuf generation and conversion code, all generated from the SDF spec.

I like the idea of having another type generated from the SDF spec though, I'd just like it to be PIMPL'ed. The result may look very similar to our current SDF DOM classes.

chapulina avatar May 27 '22 16:05 chapulina

The problem with POD types is that they can't be extended without breaking ABI. On the flip side, they're better to store in contiguous memory in the ECM. I lean in favor of prioritizing extensibility without breaking ABI rather than in-memory optimization, because the former is something we're actively dealing with all the time, while the latter has a potential theoretical benefit.

If we design this so that components can only be created by factory (i.e., private constructors), maybe we can use a custom memory allocator to create the dataPtr objects in a way that's optimal. We don't have to do that right now, but that could be something we do in the future if we see the PIMPL memory allocation is hurting performance.

azeey avatar May 27 '22 16:05 azeey

Maybe this can be worked around by enforcing a protoc + gz-proto version combination?

I think this is hard without vendoring protobuf. I think that protobuf, like many other Google libraries, is pretty aggressive about moving forward with the "live at HEAD" philosophy. Platforms that have rolling package managers (Windows and macOS) will quickly get out of sync with what is available in Ubuntu.

mjcarroll avatar May 27 '22 19:05 mjcarroll

Platforms that have rolling package managers (Windows and macOS) will quickly get out of sync with what is available in Ubuntu.

Yeah that's a good point, but I think the supported combination can be on a platform-basis, since users won't be sharing binaries across those platforms. We can say that we only support the upstream protobuf version on all platforms. That would be whatever comes with Ubuntu. On macOS and Windows, whenever the version changes, we recompile all our bottles / conda binaries to use the latest version, like we already do.

What we wouldn't support are custom protobuf versions coming from a PPA, built from source, or anything like that. Which we already don't anyway.

chapulina avatar May 31 '22 16:05 chapulina

The problem with POD types is that they can't be extended without breaking ABI.

Breaking ABI is only a problem if the the POD structs are part of the public API.

POD types could fit as part of a collection of interfaces to the same data. SDFormat could be the interface to read and write data to and from files, .proto could be the interface to serialize/deserialize over a network, POD types could be the interface internal to Gazebo, and PIMPL wrappers of the POD types could be given to users in Gazebo's public APIs and the public API's of the other gz/ignition libraries.

sloretz avatar Jun 22 '22 18:06 sloretz

Another thing to keep in mind for this discussion is that we have started supporting custom data in SDFormat files through data in custom xml namespaces. A map of std::any values (similar to gazebo::physics::Preset) could store arbitrary data for example.

scpeters avatar Jun 22 '22 21:06 scpeters