Hagar icon indicating copy to clipboard operation
Hagar copied to clipboard

Support round-tripping unknown fields

Open ReubenBond opened this issue 3 years ago • 0 comments

Currently, Hagar will safely and predictably deserialize types where the payload contains unknown fields. It supports this by marking and otherwise ignoring the field when it encounters it in the bit stream. If it encounters a reference to that ignored field later, then it will look at that mark and deserialize the field (now knowing what type to deserialize the previously unknown field as).

However, Hagar does not yet support re-serializing that object with full fidelity: those ignored fields will not be serialized. In order to support scenarios where this is useful, Hagar should recognize objects which have a property/field marked with a [Hagar.ExtensionData] attribute. Initially, we can require that the member declared as an object and potentially add an interface which allows some small degree of introspection later, as needed.

Example:

public class MyData
{
  // This could be public
  [Hagar.ExtensionData]
  private object _extensionData;

  [Hagar.Id(0)]
  public int MyValue { get; set; }
}

We can optionally also define an interface which users can optionally implement instead of annotating a field themselves.

public interface IHasExtensionData
{
  [Hagar.ExtensionData]
  object ExtensionData { get; set; }
}

Code generation needs to be updated to support identifying extension data members, deserializing into them, and serializing from them. This is a substantial change, since it means that the existing, optimized routine for serialization cannot be used. Instead, the generated code will need to check for extension data between every known field which has gaps before it

Given a type definition:

public class MyData
{
  [Id(1)] public int MyInt { get; set; }

  [Id(2)] public int MyInt2 { get; set; }

  [Id(44)] public int MyInt3 { get; set; }
}

The serialization order would be

  • Serialize any unknown field with id 0
  • Serialize known field with id 1
  • Serialize known field with id 2 (no gaps between 1 and 2)
  • Serialize any unknown fields with ids between 2 and 44
  • Serialize known field with id 44
  • Serialize any unknown fields with ids greater than 44

The performance hit will not be insignificant in some cases, and therefore the code generator should decide whether to use the existing, optimized routine, or this proposed routine based on whether the type (or a parent type) has an [ExtensionData] member.

Similarly, deserialization will need to change, but that change is likely not as involved: instead of ignoring unknown fields, it will need to place them into the extension data.

Generated code can call static helper methods to help with that serialization and deserialization.

ReubenBond avatar Jan 21 '21 14:01 ReubenBond