dipa icon indicating copy to clipboard operation
dipa copied to clipboard

Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead

Open chinedufn opened this issue 3 years ago • 6 comments

Right now if a struct has a small number of fields we default to using an enum to encode the delta.

Something like:

// We don't use generics in the generated code, just for illustration.
enum Delta3<A, B, C> {
    NoChange,
    Change_0(A),
    Change_1(B),
    Change_2(C),
    Change_0_1(A, B),
    Change_1_2(B, C),
    Change_0_2(A, C),
    Change_0_1_2(A, B, C),

The idea behind this was to be able to only transmit exactly what has changed at the cost of a single byte of overhead (the enum variant).

The downside of the enum approach is that it can't support more than a few fields. If you had 10 fields you would more than 2^10 (1024) enum variants to encode every combination of fields changing.

There is also some code complexity in needing to generate diffing and patching functions that have match blocks that match on every possible variant.

We can accomplish the 1 byte overhead without the downsides of the enum approach.

Something like:

struct MyStructDelta {
    changed_fields: u8, // Encode changed fields into the bits
    field_a: Option<DeltaTypeForFieldA>,
    field_b: Option<DeltaTypeForFieldB>,
}

We keep a single u8 within which we can use the 8 bits to encode whether or not up to 8 fields in the struct have changed.

If a struct has more than 8 fields we can use a u16 to encode changes.

Technically the enum approach would only need 1 byte overhead even if you have more than 8 fields, but I suspect that compile times would begin to suffer. This needs some investigation so that we can gauge the impact on compile times. We can leave the enum approach around as a non default approach and document our findings on the compile time impact.


From there we would generate a custom Serialize and Deserialize implementation for MyStructDelta. If a field is Option::None, we skip it when serializing the struct to a map.

When deserializing, we look at the changed_fields bits to know which fields to deserialize the map to.


Implementation

We would add a new FieldBatchingStrategy (probably needs to be renamed to something that better communicates that all it is is the way that we want to generate the delta. Perhaps DeltaFormat) variant, perhaps Self::BitflaggedStruct. We'd then add new tests and code to dipa-derive and dipa-derive-test in order to support this new way of generating deltas.

Then we'd update the documentation to explain the format and why it is the default.

chinedufn avatar Apr 12 '21 13:04 chinedufn

Maybe a downside worth mentioning is that the u* implementation would be limited to 64/128 fields for a given struct. Could be something being mentioned in the docs as a traid-off between compilation time and number of supported fields maybe 🤔

I've been looking for a crate like this for some time, even started to work on one myself. I'll just use yours :)

mainrs avatar Apr 12 '21 14:04 mainrs

That's a good point. There are already some compile time errors when you have too many fields for the enum based deltas such as:

https://github.com/chinedufn/dipa/blob/adf05864e8151b343c3f2102cd32fab31c51ebfb/crates/dipa-derive-test/src/all_tests/ui/max_delta_batch_too_large.stderr#L1-L7

So yeah we'd have documentation and also a compile time error that tells you to use #[dipa(delta_format = "bitflagged")] or whatever we end up calling it.

chinedufn avatar Apr 12 '21 14:04 chinedufn

Actually sorry - there wouldn't be a field limit.

We are totally free to use multiple u*

For example, if there are 22 fields we would use a u16 and a u8

Or, if it made the logic simpler, we may just use only u8s. So 127 fields would just be 16 u8s, for example.

chinedufn avatar Apr 13 '21 15:04 chinedufn

Actually sorry - there wouldn't be a field limit.

We are totally free to use multiple u*

For example, if there are 22 fields we would use a u16 and a u8

Or, if it made the logic simpler, we may just use only u8s. So 127 fields would just be 16 u8s, for example.

Maybe I'm misunderstanding something here but wouldn't you need only 3 u8s?

toothbrush7777777 avatar Apr 14 '21 21:04 toothbrush7777777

For the 22 field example?

You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single u8 would only give us 8 bits of information to work with.

But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.

chinedufn avatar Apr 14 '21 22:04 chinedufn

For the 22 field example?

You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single u8 would only give us 8 bits of information to work with.

But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.

Yes, of course. That was a mistake; I meant 3 u8s.

toothbrush7777777 avatar Apr 14 '21 22:04 toothbrush7777777