binrw icon indicating copy to clipboard operation
binrw copied to clipboard

Unit enums have different behavior from normal enums

Open jam1garner opened this issue 2 years ago • 3 comments

Normal enum:

    enum MagicTest {
        Zero,

        #[brw(magic = 1u8)]
        One,

        #[brw(magic = 2u8)]
        Two(()),
    }

Unit enum:

    enum MagicTest {
        Zero,

        #[brw(magic = 1u8)]
        One,

        #[brw(magic = 2u8)]
        Two,
    }

For the normal enum, this unconditionally returns MagicTest::Zero, while for the unit enum MagicTest::Zero is never returned.

Since binrw's model for enums is effectively using the same semantics as match (evaluated in-order, with untagged unit enums being effectively the wildcard pattern), this means the behavior for unit enums is what diverges from intended.

jam1garner avatar Apr 16 '22 20:04 jam1garner

Copied from the above commit:

Note this doesn't change the status of https://github.com/jam1garner/binrw/issues/116, however it does mitigate it. It might be worth just documenting that unit enums have less strict ordering guarantees?

jam1garner avatar Apr 16 '22 21:04 jam1garner

Yeah, because of the grouping by type done in PR #115, the order of matching is not the same as the order of elements in the enum. Moreover, if I recall correctly, I used a HashMap for grouping by type, which is not really deterministic. Additionally, since the matching is done with multiple match statements (one for each magic type), there's probably no warning when using the same magic value but with different types, like this:

enum Test {
    #[brw(magic = 1u8)]
    OneU8,
    #[brw(magic = b"\x01")]
    OneByte
}

Maybe translating everything into byte arrays would make it easier to handle such cases?

MrNbaYoh avatar Apr 18 '22 10:04 MrNbaYoh

Tbh I'm not sure we can actually optimize this in the general case. I'm thinking the best path forward is:

  • For non-trivial unit enums, fall back to standard codegen instead of optimizing it

  • trivial unit enums can be defined as the following:

    1. Does not include magics of different types or sizes
    2. May include a mix of both magic and unconditional variants
    3. May not include endianess declarations on individual variants
  • trivial unit enums have the following codegen strategy:

    1. read the magic value type, if any. if none is present a fixed value of () will be used
    2. match on the read value.
    3. For variants without magic _ will be the match pattern and the branch itself will include a reverse seek the size of the magic (if no variants have magic this is unnecessary)
    4. Match arms for variants with pre_asserts will include if guards

At this point I would consider this to purely be an optimization focused at (a) magic enums with bytes as the magic type (can't use repr enums) and (b) optimization of pre_assert chains*

*except not ones where they have associated data :( don't think we can even make it work using match for sum types without if/let guards being stable

I personally don't think we can optimize any other cases without getting the user to opt out of ordering guarantees. The only other case that can be optimized is unit enums with variable length magics and fixed endianess. If endianess is variable we cannot do the case overlap checking needed to do the size-based groupings we currently do*.

* technically we can prove that neither endianess can cause any overlaps or check that grouping can be done while maintaining order of priority for both endianesses, but to me this is a special case of a special case of a special case and not worth it

Alternatively we can relax ordering guarantees for a fixed set of unit enums, but to me this is not ideal as the design inconsistency becomes something you have to learn about from docs.

jam1garner avatar Apr 19 '22 16:04 jam1garner