prost
prost copied to clipboard
Support Binary Types (like UUID)
Motivation
There are a of number data types that have binary encodings which cannot be expressed in prost. The most obvious is UUID. An attempt was made provide this functionality, albeit without discussion, in #637. While Protocol Buffers does not describe how to deal with UUID specifically, it does describe how to deal with binary values. RFC 9562 (formerly RFC 4122) very clearly defines how such a value is to be encoded in binary.
While I can appreciate the stance that consumers may use UUID in different ways, the binary value is the binary value. In the spirit of Protocol Buffers, it doesn't make sense to store the value in any other format aside from the binary representation. The smallest text representation takes twice the storage! Using or presenting a UUID in any other format, especially textually, should be external and unsupported by prost. If someone really, really wants it as a string, they can already do that by defining the field as string.
UUID isn't the only binary format that someone might want to encode. There are a number well-known date and time encodings, which use binary formats as opposed to strings. prost doesn't need to support them all, but it should be possible for someone else to provide an implementation. It would be nice if heavily used types, like uuid were supported behind a feature. In my opinion, it's more than fair to say the only supported format is binary.
Desired State
What I would love to be able to do is use strongly-typed binary values in a message like so:
#[derive(Message)]
struct Something {
#[prost(bytes)]
id: uuid::Uuid,
}
Blockers
Given the popularity of certain types, it would certainly be convenient and impactful if prost provided an implementation, but I also understand it can't support everything. I'm more than willing to create my own adapter and I'm sure others would too, but unfortunately, it's impossible due to #829. Naturally, you can convert to Vec<u8> or Bytes, but that loses all of the idiomatic benefits of the type.
Prior Art
uuid-rs/uuid#716 provided an efficient way to encode/decode the value (more so than #637), but it was rejected. That was probably the right call because uuid depending on prost feels like an inverse dependency. Regardless, the desired end state would still be blocked by #829.
Feature Request
- [ ] Allow custom binary formats
- [ ] Provide support for well-known binary formats, such as UUID (as binary only)
I have some ideas about implementing encoder/decoder for specific types. This will increase the flexibility of field types. That could make is possible to use Uuid as field type.
However, this requires significant work (both coding and reviewing). If you can provide help (time or money), please let me know.
I don't mind contributing, if not just for my own selfish reasons. 😉 It initially looked like things might just need a refactoring of BytesAdapter, but it sounds like it might be something more comprehensive too. Regardless, if there's a clear path and design, I have zero problems giving back. I'm fine to flush out the design however you like. This issue seems like a good place to start so it's easy to find for anyone that's interested in following along.
So the rough plan I am thinking of is:
- Make encoding/decoding a trait. I don't have a concrete design for that. I could look something like this: https://github.com/tokio-rs/prost/issues/903#issuecomment-2238700127 I have been slowly working on this idea on this branch: https://github.com/tokio-rs/prost/compare/master...caspermeijn:prost:encoding
prost-deriveshould use those traits to select the encode/decode of the actual type in the struct. That way, it would be possible to use other Rust types than the default selection. Currently, derive attributes are required so that the specific encoding module can be selected.- Implement encoding/decoding traits for
Uuid
Maybe we can have an audio meeting using Discord to work out a plan together. You can find me in the prost channel: https://discord.gg/tokio
Thanks @commonsensesoftware for filing such a thoughtful ticket on this issue.
@caspermeijn thanks for sharing your thoughts as well. In your rough plan, you mention stabilizing some encoding functions in support of this and that there's a decent amount of work. I'm also willing to jump in and support this issue, but none of the linked tickets here have been updated in quite some time. What do you see as the next steps?