icu4x
icu4x copied to clipboard
DataBake: split serialized form from runtime form
A general DataBake feature that would solve several problems, including #2417, would be a way to specify a different data struct as the serialized form of the main data struct, with functions to convert between them.
For example, you might write
#[databake(path = foo)]
#[databake::serialize_as(path = foo::FooFields, from = from_fields, to = to_fields)]
pub struct Foo<'data> {
// private fields (NOT processed by the proc macro)
}
#[databake(path = foo)]
pub struct FooFields {
// public fields (processed by the proc macro)
}
impl<'data> Foo<'data> {
pub const fn from_fields(fields: &'data FooFields) -> Self {
// a cheap const constructor
}
pub fn to_fields(&self) -> FooFields {
// invoked during serialization only
}
}
CC @Manishearth @robertbastian
The way I'd prefer to do this is:
#[databake(path = foo)]
#[databake::constructor = from_fields]
pub struct Foo<'data> {
// private fields (processed by the proc macro)
}
impl<'data> Foo<'data> {
// fields are in the same order as they are in Foo.
// if order needs to change you must switch to fully manual
pub const fn from_fields(field1: Field1Type, field2: Field2Type ) -> Self {
}
}
We could also have both (serde does), but I consider having methods to be less API-polluting than having a new struct. Though in both cases we could doc(hidden) them (I'm more open to doc-hidden on methods and structs over fields)
@Manishearth Your proposal is tied to the list of fields in Foo. This serves a more narrow use case. My proposal decouples the list of fields in Foo from the struct being serialized, which fixes the private fields issue but also solves other issues.
Also, you copied my comment
// private fields (NOT processed by the proc macro)
which is exactly what your solution doesn't do.
oops, yeah.
Your proposal does decouple things completely, but at the cost of a new public type; which I consider to be pretty high. I would rather ICU4X not be peppered with these fields types.
Pros and cons of a separate fields type:
- Pro: Allows the type
Footo have a different data model / different private fields - Pro: The data model of
Foocan change without changing any public function signatures - Pro: We could additionally consider
FooFieldsas something that could evolve if we say that it is unstable (like the unstable constructors) or if we markFooFieldsas#[doc(hidden)] - Con: ICU4X peppered with more public internal types
- Con: A bit more boilerplate than Manish's proposal (but still less than custom Bake impls)
Did I miss anything?
Another thought on the proposed convention for function names: from_baked_fields and to_baked_fields. These function names can be the default names so that they don't need to be listed in the derive. The standard boilerplate then becomes
#[databake(path = foo)]
#[databake::bake_as(foo::FooBakedFields)]
pub struct Foo<'data> {
// private fields (NOT processed by the proc macro)
}
#[databake(path = foo)]
pub struct FooBakedFields {
// public fields (processed by the proc macro)
}
impl<'data> Foo<'data> {
pub const fn from_baked_fields(fields: &'data FooBakedFields) -> Self {
// a cheap const constructor
}
pub fn to_baked_fields(&self) -> FooBakedFields {
// invoked during serialization only
}
}
Thought: This would also enable us to perform validation (to the extent allowed in a const context) either by panicking in from_baked_fields or by returning a Result and making DataBake generate a panicky match statement.
Then we could get rid of some of our unsafe constructors, which would be nice.
Alternatively, we could handle the unsafe context as part of the macro, which would enable us to use the derive in more places.
I mean, part of the thing is that we can have both these schemes, like serde does. I just don't want us to use from_fields in ICU4X. Perhaps with a hidden type, I guess.
The problem with having a naming convention is that that's super rigid since this is an impact on the public API. Unlike serde where everything can be module-private; any such function must be public; so I'd argue that we have to let the user choose the names. That's also why I want more flexibility in not requiring the user to use e.g. FooBakedFields or whatever. Everything of this form has an impact on the public API and that's quite heavy for what is essentially a plumbing function for a library.
I'm not proposing that any of the names are fixed; they should be customizable. I'm only proposing that from_baked_fields and to_baked_fields seems like a reasonable convention to be used in absence of something better.
The functions and struct can be #[doc(hidden)] if desired.
Hmm, fair. I think that would be okay for ICU4X.
CC @robertbastian PTAL
From the discussion so far I think this is hard to get right, and I don't really see a benefit. If structs have private fields they will have a custom serde implementations, so I don't think avoiding the custom Bake implementation is that important.
Validation is nice to have, but as long as we cannot validate ZeroVecs, this would not solve 99% of the unsafe calls in the generated code, and we just have to trust that it's correct.
Discussion: @robertbastian: for Char16Trie, for example, we can do
#[bake(constructor = from_fields, args = data)]
pub struct Char16Trie<'data> {
/// An array of u16 containing the trie data.
#[cfg_attr(feature = "serde", serde(borrow))]
data: ZeroVec<'data, u16>,
}
impl Char16Trie {
pub fn from_fields(data: &'data [u16]) { ... }
}
Discuss with:
- @robertbastian
- @sffc
Optional:
- @Manishearth