bitcode icon indicating copy to clipboard operation
bitcode copied to clipboard

Support for Enums with more than 256 variants

Open ZJaume opened this issue 1 year ago • 9 comments

Are there any plans on supporting enums larger than u8? I'm currently developing a language identifier that has 238 variants for the language codes and will probably surpass that amount in the future when more languages are added.

Also, I found the error a bit confusing

error: enums with more than 256 variants are not supported
  --> src/main.rs:10:10
   |
10 | pub enum Lang {
   |          ^^^^

because it does not say anything about what is causing the error. So I did not know if it was a limitation from the language, the strum derives or the bitcode derives.

Thank you very much for your library. It was one of the main optimizations I did for my tool and was able to cut down the model loading time more than a half.

ZJaume avatar Sep 23 '24 11:09 ZJaume

The main reason I limited enums to a 256 variants is because bitcode 0.6 requires calculating a histogram of all the variants of each enum while deserializing. With 256 variants a [usize; 256] can be used to calculate the histogram which is quite fast. For an arbitrary number of variants a HashMap<u32, usize> is required which isn't fast.

caibear avatar Sep 23 '24 18:09 caibear

requires calculating a histogram

Perhaps an exception can be made for C-style enums, i.e. those that lack fields in their variants, which don't need a histogram.

finnbear avatar Sep 23 '24 19:09 finnbear

Is a [usize; 65536] too much for that histogram?

ZJaume avatar Sep 24 '24 07:09 ZJaume

Is a [usize; 65536] too much for that histogram?

Yes, because that would require iterating all 65536 elements to see which ones are non zero which is significant overhead. This would only be faster than a HashMap<u32, usize> for calculating a histogram of a very large message where the 1 time cost of iteration does not dominate the runtime.

@finnbear does bring up a good point that histograms are only required non C-style enums so we could support u16 C-style enums easier.

caibear avatar Sep 24 '24 16:09 caibear

Has been any progress on supporting u16 C-style enums? I'm not familiar with the bitcode source code, but could try to implement it if it's easy and you point me in the right direction.

Edit: without serde

ZJaume avatar Feb 10 '25 14:02 ZJaume

Has been any progress on supporting u16 C-style enums?

No, we're not currently working on it.

finnbear avatar Feb 10 '25 19:02 finnbear

The next feature we plan to add is the ability to use serde to encode a field within the Encode macro. This provides a solution to many edge cases at once. After that we can revisit this issue and see if it's still necessary.

caibear avatar Feb 10 '25 23:02 caibear

hello,

I just came across this when trying to add bitcode to give it a spin, but it would not work for an enum with more than 256 variants, which in this case, stopped me in my tracks. The enum in my case is #[repr(u16)] and a C-style enum, so I was hoping to find some kind of escape hatch that would let me specify to just treat it as a u16 integer for Encode/Decode.

Thanks for your work on this library! The benchmark results are impressive, eager to try it out soon.

jstrong-lhava avatar Aug 23 '25 01:08 jstrong-lhava

Hi @jstrong-lhava; thanks for your comment! This provides motivation for supporting C-style enums with more than 256 variants. Unfortunately, there is no escape hatch other than #[bitcode(skip)] on any un-encodable fields that you don't need to encode.

finnbear avatar Aug 23 '25 02:08 finnbear