dipa
dipa copied to clipboard
Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead
Right now if a struct has a small number of fields we default to using an enum to encode the delta.
Something like:
// We don't use generics in the generated code, just for illustration.
enum Delta3<A, B, C> {
NoChange,
Change_0(A),
Change_1(B),
Change_2(C),
Change_0_1(A, B),
Change_1_2(B, C),
Change_0_2(A, C),
Change_0_1_2(A, B, C),
The idea behind this was to be able to only transmit exactly what has changed at the cost of a single byte of overhead (the enum variant).
The downside of the enum approach is that it can't support more than a few fields. If you had 10 fields you would more than 2^10 (1024) enum variants to encode every combination of fields changing.
There is also some code complexity in needing to generate diffing and patching functions that have match blocks that match on every possible variant.
We can accomplish the 1 byte overhead without the downsides of the enum approach.
Something like:
struct MyStructDelta {
changed_fields: u8, // Encode changed fields into the bits
field_a: Option<DeltaTypeForFieldA>,
field_b: Option<DeltaTypeForFieldB>,
}
We keep a single u8
within which we can use the 8 bits to encode whether or not up to 8 fields in the struct have changed.
If a struct has more than 8 fields we can use a u16
to encode changes.
Technically the enum approach would only need 1 byte overhead even if you have more than 8 fields, but I suspect that compile times would begin to suffer. This needs some investigation so that we can gauge the impact on compile times. We can leave the enum approach around as a non default approach and document our findings on the compile time impact.
From there we would generate a custom Serialize and Deserialize implementation for MyStructDelta
. If a field is Option::None
, we skip it when serializing the struct to a map.
When deserializing, we look at the changed_fields
bits to know which fields to deserialize the map to.
Implementation
We would add a new FieldBatchingStrategy
(probably needs to be renamed to something that better communicates that all it is is the way that we want to generate the delta. Perhaps DeltaFormat
) variant, perhaps Self::BitflaggedStruct
. We'd then add new tests and code to dipa-derive
and dipa-derive-test
in order to support this new way of generating deltas.
Then we'd update the documentation to explain the format and why it is the default.
Maybe a downside worth mentioning is that the u*
implementation would be limited to 64/128 fields for a given struct. Could be something being mentioned in the docs as a traid-off between compilation time and number of supported fields maybe 🤔
I've been looking for a crate like this for some time, even started to work on one myself. I'll just use yours :)
That's a good point. There are already some compile time errors when you have too many fields for the enum based deltas such as:
https://github.com/chinedufn/dipa/blob/adf05864e8151b343c3f2102cd32fab31c51ebfb/crates/dipa-derive-test/src/all_tests/ui/max_delta_batch_too_large.stderr#L1-L7
So yeah we'd have documentation and also a compile time error that tells you to use #[dipa(delta_format = "bitflagged")]
or whatever we end up calling it.
Actually sorry - there wouldn't be a field limit.
We are totally free to use multiple u*
For example, if there are 22 fields we would use a u16
and a u8
Or, if it made the logic simpler, we may just use only u8
s. So 127 fields would just be 16 u8s, for example.
Actually sorry - there wouldn't be a field limit.
We are totally free to use multiple
u*
For example, if there are 22 fields we would use a
u16
and au8
Or, if it made the logic simpler, we may just use only
u8
s. So 127 fields would just be 16 u8s, for example.
Maybe I'm misunderstanding something here but wouldn't you need only 3 u8
s?
For the 22 field example?
You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single u8
would only give us 8 bits of information to work with.
But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.
For the 22 field example?
You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single
u8
would only give us 8 bits of information to work with.But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.
Yes, of course.
That was a mistake; I meant 3 u8
s.