Mini-RFC: Custom type wrappers.
following #752 and other questions like #730, and even in my opinion the discussions surrounding pointer types like Arc,Box, etc, i hope we can agree that there is a need to allow users of prost to use their own types in the generated rust code. usages include:
- validation wrappers: using things like
ordered_float::NotNanto wrap af64(a protobufdouble) - performance optimizations for example #752, to allow for zero-copy parsing
- allocation/indirection/semantics, for example either adding Box/Arc, or using Heapless types
im sure there are more uses, but these cover the basics. getting prost-build to actually emit the right types is trivial. the question is where custom types fit with regards to encoding&decoding to the wire.
@LucioFranco proposed here:
I think we could just have a config option that is just config.add_type_override(".package.msg.field", "foo:bar::Baz"); and its up to the user to ensure that type implements Message. I think that would support every use case?
However, trying to implement it in that way makes it clear that there is large semantic mismatch here, which is that the wrappers should be transparent wrt the wire format, and implementations on primitive types or wrappers around them are not necessarily messages in their own right. for example, these 2 fields are not equal.
#[derive(Clone, PartialEq, prost::Message)]
pub struct Example {
#[prost(message, optional, tag = "2")]
pub double_as_msg: ::core::option::Option<f64>,
#[prost(double, optional, tag = "1")]
pub double_as_double: ::core::option::Option<f64>,
}
if we serialize that struct, and then check with protoscope
protoscope test.bin
1: 42.0 # 0x4045000000000000i64
2: {1: 42.0} # 0x4045000000000000i64
because prost implements Message for f64 as if it was google.protobuf.DoubleValue, which i think is a mistake. a plain f64 isnt a message, it is a field. regardless, if we wanted to try and do the NotNan trick and implement Message for it:
struct NotNan(f64);
impl Message for NotNan {
fn encode_raw<B>(&self, buf: &mut B)
where
B: BufMut,
Self: Sized,
{
let inner = self.0;
// now what? using Message for f64 is not wire-transperant.
// inner.encode(buf)
// the plain encoding function requires a tag, which isnt avilable in this context
// prost::encoding::double::encode(TAG,value,buf)
}
fn merge_field<B>(
&mut self,
tag: u32,
wire_type: WireType,
buf: &mut B,
ctx: DecodeContext,
) -> Result<(), prost::DecodeError>
where
B: Buf,
Self: Sized,
{
// this is sort of doable, but ignores the tag value
let mut inner = 0.0f64;
prost::encoding::double::merge(wire_type, &mut inner, buf, ctx);
if inner.is_nan() {
// error
} else {
*self = NotNan(inner);
};
Ok(())
}
fn encoded_len(&self) -> usize {
// again, the message impl for f64 is wrong here because it hardcodes the tag to 1
// self.0.encoded_len()
// and the plain encoding function needs a tag
// prost::encoding::double::encoded_len(tag, &self.0);
}
fn clear(&mut self) {
*self = NotNan(0.0);
}
}
so this is not possible by simply using this trait, and of course doesn't account for repeated, packed encodings, etc.
Alternative approach would be to add a new trait with explicit encoding functions, of the same form as in the prost::encoding module:
trait FieldEncoding: Sized {
fn encode<B: BufMut>(tag: u32, value: &Self, buf: &mut B);
fn merge<B: Buf>(
wire_type: WireType,
value: &mut Self,
buf: &mut B,
ctx: DecodeContext,
) -> Result<(), DecodeError>;
fn encoded_len(&self, tag: u32) -> usize;
// etc, for repeated, packed
}
impl FieldEncoding for NotNan {
fn encode<B: BufMut>(tag: u32, value: &Self, buf: &mut B) {
prost::encoding::double::encode(tag, &value.0, buf)
}
fn merge<B: Buf>(
wire_type: WireType,
value: &mut Self,
buf: &mut B,
ctx: DecodeContext,
) -> Result<(), DecodeError> {
let mut inner = 0.0f64;
prost::encoding::double::merge(wire_type, &mut inner, buf, ctx)?;
if inner.is_nan() {
// error
} else {
*value = NotNan(inner);
};
Ok(())
}
fn encoded_len(&self, tag: u32) -> usize {
prost::encoding::double::encoded_len(tag, &self.0)
}
}
if we do that, we can, for a custom field, have prost-derive emit the the "proper" message impl, for example:
pub struct Example {
// #[prost(double, optional, tag = "2")]
pub double_as_wrapped: ::core::option::Option<NotNan>,
// #[prost(double, optional, tag = "1")]
pub double_as_double: ::core::option::Option<f64>,
}
would now emit:
impl ::prost::Message for Example {
#[allow(unused_variables)]
fn encode_raw<B>(&self, buf: &mut B)
where
B: ::prost::bytes::BufMut,
{
if let ::core::option::Option::Some(ref value) = self.double_as_double {
::prost::encoding::double::encode(1u32, value, buf);
}
if let ::core::option::Option::Some(ref value) = self.double_as_wrapped {
FieldEncoding::encode(2u32, value, buf)
}
}
#[allow(unused_variables)]
fn merge_field<B>(
&mut self,
tag: u32,
wire_type: ::prost::encoding::WireType,
buf: &mut B,
ctx: ::prost::encoding::DecodeContext,
) -> ::core::result::Result<(), ::prost::DecodeError>
where
B: ::prost::bytes::Buf,
{
const STRUCT_NAME: &'static str = stringify!(Example);
match tag {
1u32 => {
let mut value = &mut self.double_as_double;
::prost::encoding::double::merge(
wire_type,
value.get_or_insert_with(::core::default::Default::default),
buf,
ctx,
)
.map_err(|mut error| {
error.push(STRUCT_NAME, stringify!(double_as_double));
error
})
}
2u32 => {
let mut value = &mut self.double_as_wrapped;
FieldEncoding::merge(
wire_type,
value.get_or_insert_with(::core::default::Default::default),
buf,
ctx,
)
.map_err(|mut error| {
error.push(STRUCT_NAME, stringify!(double_as_wrapped));
error
})
}
_ => ::prost::encoding::skip_field(wire_type, tag, buf, ctx),
}
}
fn encoded_len(&self) -> usize {
0 + self.double_as_double.as_ref().map_or(0, |value| {
::prost::encoding::double::encoded_len(1u32, value)
}) + self
.double_as_wrapped
.as_ref()
.map_or(0, |value| FieldEncoding::encoded_len(2u32, value))
}
fn clear(&mut self) {
self.double_as_double = ::core::option::Option::None;
self.double_as_wrapped = ::core::option::Option::None;
}
}
at which point, encoding it is transparent on the wire and yields (again, encoding and viewing with protoscope)
1: 42.0 # 0x4045000000000000i64
2: 42.0 # 0x4045000000000000i64
which is transparent. This approach does allow for users to define custom wrappers to use with their own types.
an extension to this idea would be to allow users to supply the path to free encoding functions via prost build, similar to serde_with:
pub struct Example {
// #[prost(double, optional, tag = "2",encode_fn = "::some::encoder::func", merge_fn: "::some::decoder::func")]
pub double_as_wrapped: ::core::option::Option<NotNan>,
// #[prost(double, optional, tag = "1")]
pub double_as_double: ::core::option::Option<f64>,
}
which would also also allow users to use external types they dont own.
These are my 2 cents, and i tested both of these approaches, and they both work. is this a step we are willing to take? @LucioFranco, what do you say?
Hi sorry for the delay on this, I started to review this but I am going to need some time to really dive deep on this since what you are proposing is a pretty big change. So I just wanted to let you know its on my todo list but will likely take me a bit with my other priorities atm.
I'm interested in this direction. I've been looking at replacing String fields with SmolStr (fairly similar to replacing with Arc<str>). Currently the SmolStrs need to be converted to/from Strings at the grpc edges of the service which is inefficient.
Are there already some forks around that can handle this?