icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

DataBake: split serialized form from runtime form

Open sffc opened this issue 3 years ago • 28 comments

A general DataBake feature that would solve several problems, including #2417, would be a way to specify a different data struct as the serialized form of the main data struct, with functions to convert between them.

For example, you might write

#[databake(path = foo)]
#[databake::serialize_as(path = foo::FooFields, from = from_fields, to = to_fields)]
pub struct Foo<'data> {
  // private fields (NOT processed by the proc macro)
}

#[databake(path = foo)]
pub struct FooFields {
  // public fields (processed by the proc macro)
}

impl<'data> Foo<'data> {
  pub const fn from_fields(fields: &'data FooFields) -> Self {
    // a cheap const constructor
  }
  pub fn to_fields(&self) -> FooFields {
    // invoked during serialization only
  }
}

CC @Manishearth @robertbastian

sffc avatar Aug 25 '22 20:08 sffc

The way I'd prefer to do this is:

#[databake(path = foo)]
#[databake::constructor = from_fields]
pub struct Foo<'data> {
  // private fields (processed by the proc macro)
}

impl<'data> Foo<'data> {
  // fields are in the same order as they are in Foo.
  // if order needs to change you must switch to fully manual
  pub const fn from_fields(field1: Field1Type, field2: Field2Type ) -> Self {
  }
}

We could also have both (serde does), but I consider having methods to be less API-polluting than having a new struct. Though in both cases we could doc(hidden) them (I'm more open to doc-hidden on methods and structs over fields)

Manishearth avatar Aug 25 '22 20:08 Manishearth

@Manishearth Your proposal is tied to the list of fields in Foo. This serves a more narrow use case. My proposal decouples the list of fields in Foo from the struct being serialized, which fixes the private fields issue but also solves other issues.

Also, you copied my comment

  // private fields (NOT processed by the proc macro)

which is exactly what your solution doesn't do.

sffc avatar Aug 25 '22 23:08 sffc

oops, yeah.

Your proposal does decouple things completely, but at the cost of a new public type; which I consider to be pretty high. I would rather ICU4X not be peppered with these fields types.

Manishearth avatar Aug 26 '22 05:08 Manishearth

Pros and cons of a separate fields type:

  1. Pro: Allows the type Foo to have a different data model / different private fields
  2. Pro: The data model of Foo can change without changing any public function signatures
  3. Pro: We could additionally consider FooFields as something that could evolve if we say that it is unstable (like the unstable constructors) or if we mark FooFields as #[doc(hidden)]
  4. Con: ICU4X peppered with more public internal types
  5. Con: A bit more boilerplate than Manish's proposal (but still less than custom Bake impls)

Did I miss anything?

sffc avatar Aug 26 '22 17:08 sffc

Another thought on the proposed convention for function names: from_baked_fields and to_baked_fields. These function names can be the default names so that they don't need to be listed in the derive. The standard boilerplate then becomes

#[databake(path = foo)]
#[databake::bake_as(foo::FooBakedFields)]
pub struct Foo<'data> {
  // private fields (NOT processed by the proc macro)
}

#[databake(path = foo)]
pub struct FooBakedFields {
  // public fields (processed by the proc macro)
}

impl<'data> Foo<'data> {
  pub const fn from_baked_fields(fields: &'data FooBakedFields) -> Self {
    // a cheap const constructor
  }
  pub fn to_baked_fields(&self) -> FooBakedFields {
    // invoked during serialization only
  }
}

sffc avatar Aug 26 '22 17:08 sffc

Thought: This would also enable us to perform validation (to the extent allowed in a const context) either by panicking in from_baked_fields or by returning a Result and making DataBake generate a panicky match statement.

Then we could get rid of some of our unsafe constructors, which would be nice.

Alternatively, we could handle the unsafe context as part of the macro, which would enable us to use the derive in more places.

sffc avatar Aug 26 '22 18:08 sffc

I mean, part of the thing is that we can have both these schemes, like serde does. I just don't want us to use from_fields in ICU4X. Perhaps with a hidden type, I guess.

The problem with having a naming convention is that that's super rigid since this is an impact on the public API. Unlike serde where everything can be module-private; any such function must be public; so I'd argue that we have to let the user choose the names. That's also why I want more flexibility in not requiring the user to use e.g. FooBakedFields or whatever. Everything of this form has an impact on the public API and that's quite heavy for what is essentially a plumbing function for a library.

Manishearth avatar Aug 26 '22 19:08 Manishearth

I'm not proposing that any of the names are fixed; they should be customizable. I'm only proposing that from_baked_fields and to_baked_fields seems like a reasonable convention to be used in absence of something better.

The functions and struct can be #[doc(hidden)] if desired.

sffc avatar Aug 26 '22 19:08 sffc

Hmm, fair. I think that would be okay for ICU4X.

Manishearth avatar Aug 26 '22 19:08 Manishearth

CC @robertbastian PTAL

sffc avatar Dec 16 '22 11:12 sffc

From the discussion so far I think this is hard to get right, and I don't really see a benefit. If structs have private fields they will have a custom serde implementations, so I don't think avoiding the custom Bake implementation is that important.

Validation is nice to have, but as long as we cannot validate ZeroVecs, this would not solve 99% of the unsafe calls in the generated code, and we just have to trust that it's correct.

robertbastian avatar Dec 16 '22 12:12 robertbastian

Discussion: @robertbastian: for Char16Trie, for example, we can do

#[bake(constructor = from_fields, args = data)]
pub struct Char16Trie<'data> {
    /// An array of u16 containing the trie data.
    #[cfg_attr(feature = "serde", serde(borrow))]
    data: ZeroVec<'data, u16>,
}

impl Char16Trie {
    pub fn from_fields(data: &'data [u16]) { ... }
}

sffc avatar Dec 16 '22 17:12 sffc

Discuss with:

  • @robertbastian
  • @sffc

Optional:

  • @Manishearth

sffc avatar Jun 15 '23 18:06 sffc