bevy icon indicating copy to clipboard operation
bevy copied to clipboard

bevy_reflect: Improve serialization format

Open MrGVSV opened this issue 2 years ago • 7 comments

Objective

The current serialization format used by bevy_reflect is both verbose and error-prone. Taking the following structs[^1] for example:

// -- src/inventory.rs

#[derive(Reflect)]
struct Inventory {
  id: String,
  max_storage: usize,
  items: Vec<Item>
}

#[derive(Reflect)]
struct Item {
  name: String
}

Given an inventory of a single item, this would serialize to something like:

// -- assets/inventory.ron

{
  "type": "my_game::inventory::Inventory",
  "struct": {
    "id": {
      "type": "alloc::string::String",
      "value": "inv001",
    },
    "max_storage": {
      "type": "usize",
      "value": 10
    },
    "items": {
      "type": "alloc::vec::Vec<alloc::string::String>",
      "list": [
        {
          "type": "my_game::inventory::Item",
          "struct": {
            "name": {
              "type": "alloc::string::String",
              "value": "Pickaxe"
            },
          },
        },
      ],
    },
  },
}

Aside from being really long and difficult to read, it also has a few "gotchas" that users need to be aware of if they want to edit the file manually. A major one is the requirement that you use the proper keys for a given type. For structs, you need "struct". For lists, "list". For tuple structs, "tuple_struct". And so on.

It also requires that the "type" entry come before the actual data. Despite being a map— which in programming is almost always orderless by default— the entries need to be in a particular order. Failure to follow the ordering convention results in a failure to deserialize the data.

This makes it very prone to errors and annoyances.

Solution

Using #4042, we can remove a lot of the boilerplate and metadata needed by this older system. Since we now have static access to type information, we can simplify our serialized data to look like:

// -- assets/inventory.ron

{
  "my_game::inventory::Inventory": {
    "id": "inv001",
    "max_storage": 10,
    "items": [
      {
        "name": "Pickaxe"
      },
    ],
  },
},

This is much more digestible and a lot less error-prone (no more key requirements and no more extra type names).

Custom Serialization

Additionally, this PR adds the opt-in ability to specify a custom serde implementation to be used rather than the one created via reflection. For example[^1]:

// -- src/inventory.rs

#[derive(Reflect, Serialize)]
#[reflect(Serialize)]
struct Item {
  #[serde(alias = "id")]
  name: String
}
// -- assets/inventory.ron

{
  "my_game::inventory::Inventory": {
    "id": "inv001",
    "max_storage": 10,
    "items": [
      (
        id: "Pickaxe"
      )
    ],
  },
},

Because we specified a custom Serialize implementation with #[reflect(Serialize)], we actually get the RON-style syntax for a struct.

Note: If we had done this for Inventory, it would still define the "type" and continue to put the data in "value" (but in RON syntax).

See Example
// -- assets/inventory.ron

{
"my_game::inventory::Inventory": (
 id: "inv001",
 max_storage: 10,
 items: [
   (
     name: "Pickaxe"
   ),
 ],
),
}

By allowing users to define their own serialization methods, we do two things:

  1. We give more control over how data is serialized/deserialized to the end user
  2. We avoid having to re-define serde's attributes and forcing users to apply both (e.g. we don't need a #[reflect(alias)] attribute).

Tuples!

Tuples and tuple-likes now use serde's de/serialize_tuple method. This provides better semantics— being fixed-size sequences, we should be able to indicate that size at when de/serializing. It also addresses some of serde's own limitations (see sections below).

Tuples

Obviously, tuples now use this format. This is better because tuples do have a maximum size according to serde. I don't think we actually run into this limitation, but it's a good reminder to avoid large tuples in case we want to be compatible with other formats.

{
  "(f32, f32)": (1.0, 1.0)
}
Old Format
{
  "type": "(f32, f32)",
  "tuple": [
    {
      "type": "f32",
      "value": 1.0
    },
    {
      "type": "f32",
      "value": 1.0
    }
  ]
}
Tuple Structs

This also applies to tuple structs. Though, it should be noted that there is a dedicated de/serialize_tuple_struct method in serde. However, it requires a static name, which is not yet provided by TypeInfo. Even if it was, we have no way of handling Dynamic*** types (see https://github.com/bevyengine/rfcs/pull/56).

{
  "my_crate::Bar": ("Hello World!")
}
Old Format
{
  "type": "my_crate::Bar",
  "tuple_struct": [
    {
      "type": "alloc::string::String",
      "value": "Hello World!"
    }
  ]
}
Arrays

It may be a bit surprising to some, but arrays now also use the tuple format. This is because they essentially are tuples (a sequence of values with a fixed size), but only allow for homogenous types. Additionally, this is how RON handles them and is probably a result of the 32-capacity limit imposed on them (both by serde and by bevy_reflect).

{
  "[i32; 3]": (1, 2, 3)
}
Old Format
{
  "type": "[i32; 3]",
  "array": [
    {
      "type": "i32",
      "value": 1
    },
    {
      "type": "i32",
      "value": 2
    },
    {
      "type": "i32",
      "value": 3
    }
  ]
}

Changelog

  • Re-worked serialization/deserialization for reflected types
  • Added TypedReflectDeserializer for deserializing data with known TypeInfo
  • Renamed ReflectDeserializer to UntypedReflectDeserializer
  • ~~Replaced usages of deserialize_any with deserialize_map for non-self-describing formats~~ Reverted this change since there are still some issues that need to be sorted out (in a separate PR). By reverting this, crates like bincode can throw an error when attempting to deserialize non-self-describing formats (bincode results in DeserializeAnyNotSupported)
  • Tuples, tuple structs, and arrays are now all de/serialized as tuples (similar to RON)

Migration Guide

  • This PR reduces the verbosity of the scene format. Scenes will need to be updated accordingly:
// Old format
{
  "type": "my_game::item::Item",
  "struct": {
    "id": {
      "type": "alloc::string::String",
      "value": "bevycraft:stone",
    },
    "tags": {
      "type": "alloc::vec::Vec<alloc::string::String>",
      "list": [
        {
          "type": "alloc::string::String",
          "value": "material"
        },
      ],
    },
}

// New format
{
  "my_game::item::Item": {
    "id": "bevycraft:stone",
  },
  "tags": ["material"]
}
  • There are also some smaller changes to how certain types are serialized/deserialized. Tuples, tuple structs, and arrays are all de/serialized as standard tuples now (if using a de/serializer like RON):
// Old format
{
  "type": "my_crate::Foo",
  "tuple": {
    "type": "(f32, f32)",
    "tuple": [
        {
          "type": "f32",
          "value": 1.0
        },
        {
          "type": "f32",
          "value": 1.0
        }
      ]
    },
    "tuple_struct": {
      "type": "my_crate::Bar",
      "tuple_struct": [
        {
          "type": "alloc::string::String",
          "value": "Hello World!"
        }
      ]
    },
    "array": {
      "type": "[i32; 3]",
      "array": [
        {
          "type": "i32",
          "value": 1
        },
        {
          "type": "i32",
          "value": 2
        },
        {
          "type": "i32",
          "value": 3
        }
      ]
    }
  }
}

// New format
{
  "my_crate::Foo": {
    "tuple": (1.0, 1.0),
    "tuple_struct": ("Hello World!"),
    "array": (1, 2, 3),
  }
}

Entering fewer/more than the exact amount of values results in an error when deserializing.

  • ReflectDeserializer has been renamed to UntypedReflectDeserializer

[^1]: Some derives omitted for brevity.

MrGVSV avatar Apr 22 '22 20:04 MrGVSV

Previously discussed in #92 and #4153. To make life easier for reviewers, how does this compare to this proposals?

alice-i-cecile avatar Apr 22 '22 20:04 alice-i-cecile

Previously discussed in #92 and #4153. To make life easier for reviewers, how does this compare to this proposals?

I actually hadn't seen #92, so thanks for pointing that out!

In regards to #4153, though, I think this is a good intermediate step towards a syntax like that. The main "issue" with the syntax proposed in that PR is that it's pretty reliant on the RON format and likely wouldn't translate to JSON or other formats well— especially if they're generated by a non-Rust tool (see @james7132's comment).

So I think this is (hopefully) a less controversial improvement to the format while we decide and finalize on something like #4153.

MrGVSV avatar Apr 22 '22 20:04 MrGVSV

Anecdotally, this will be really useful in a surprising way; I'm using reflection to power messages for my editor, so cutting down on the verbosity is a performance win as well as a reliability win for how much types can change or move before messages become invalid between versions.

PROMETHIA-27 avatar Jun 13 '22 02:06 PROMETHIA-27

as well as a reliability win for how much types can change or move before messages become invalid between versions.

This would be appreciated. I've had cases where I trivially reorganized code and completely broke all of my serialized structs in frustrating ways that are hard for beginners to debug.

alice-i-cecile avatar Jun 13 '22 03:06 alice-i-cecile

My suggestion on making the serialization syntax even more human-processable: omit type and value tags and express them as key-value pairs.

Example

Instead of

{
  "type": "my_game::inventory::Inventory",
  "value": {
    "id": "inv001",
    "max_storage": 10,
    "items": [
      (
        id: "Pickaxe"
      )
    ],
  },
},

we would write

"my_game::inventory::Inventory": {
  "id": "inv001",
  "max_storage": 10,
  "items": [
    (
      id: "Pickaxe"
    )
  ],
},

Why?

TL;DR it is just a less error-prone and more intuitive representation without artificial to JSON / RON / YAML rules, instead these rules are expressed naturally through JSON / RON / YAML syntax.

  1. With the current implementation you need to introduce artificial "you have to write type before value" rule. Ordering of key-values is unnatural for such data representations. With my suggestion you simply can't write value before key as there's no such syntax.
  2. Same deal with "you can't add multiple components of the same type" rule. Because ECS (Bevy's in particular) don't allow for multiple components, the data structure that represents an ECS entity's components is a component set (or a type-value dictionary/map which is almost the same) and absolutely not an array. Current format uses arrays, my suggestion uses dictionary/set representation as you simply can't define the same key (type) twice on one object. I don't think using arrays to represent sets is a good idea.
  3. It is in-line with Serde's suggested default serialization style (typetag also mentions this as Serde's default among other options).
  4. As an added bonus it is also more compact and has less indentation levels while not sacrificing readability.

Alternatives considered

#4153 is a good pitch too and aims to solve the same issue, except the solution @AronDerenyi proposed only solves one of the two inconsistencies. For RON it looks nicer, but it still uses arrays to represent entity's components, which allows for multiple components on a syntax level, which is semantically invalid and has to be limited, whereas what I propose doesn't allow for that. Also I don't think you really can express the same syntax with JSON, YAML etc. meaning that it is less portable.

Metadorius avatar Aug 02 '22 09:08 Metadorius

I really like this last suggestion here, by @Metadorius , for the format. :) Agreed with the reasoning. Seems like this is perhaps the best proposal we have so far.

inodentry avatar Aug 02 '22 09:08 inodentry

I updated the code and the PR description to follow @Metadorius' suggestion! It now uses {type_name: data} in place of {type: type_name, value: data}.

MrGVSV avatar Aug 02 '22 15:08 MrGVSV

Closed in favor of #5723 per https://github.com/bevyengine/bevy/pull/5723#discussion_r960251187 🎉

alice-i-cecile avatar Sep 03 '22 19:09 alice-i-cecile