spicy Add native uint24 support

Working on DHCPv6, there's a 24bit transaction ID: https://datatracker.ietf.org/doc/html/rfc8415#section-8

My current "hack" for parsing is as follows:

    transaction_id: uint8[3] &convert=uint32((uint32($$[0]) << 16) + (uint32($$[1]) << 8) + uint32($$[2]));

MySQL also has a 24bit packet length field: https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_protocol_basic_packets.html#sect_protocol_basic_packets_packet

Zeek's Spicy SSL parser apparently also has a need for uint24: https://github.com/zeek/zeek/blob/master/src/analyzer/protocol/ssl/spicy/SSL.spicy#L968

type uint24 = unit {
    a: bytes &size=3;
} &convert=$$.a.to_uint(spicy::ByteOrder::Big);

This version likey has quite some overhead due to "bytes" and the to_uint() call.

The SMB protocol's message length is also using 24bits: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/1dfacde4-b5c7-4494-8a14-a09d3ab4cc83

This seems sufficient examples in popular protocols to motivate native uint24 support in Spicy. Thoughts?

Nov 29 '24 10:11 awelzel

What about arbitrary bit-width integers? I've been using a lot of Zig and I've loved that feature, specifically for cases like this (for example I used u56 here)

it's not absolutely necessary, but I could see that being really nice for Spicy too.

I have no idea how I'd implement that, though :)

Nov 29 '24 15:11 evantypanski

What about arbitrary bit-width integers?

:-)

For arbitrary width integer parsing, bitfields might be a reasonable and existing mechanism, actually.

As a clarification, this isn't about supporting 24bit or arbitrary-width integer arithmetic (seems esoteric), but having a short-cut to parse 3 bytes into an integer.

Nov 30 '24 16:11 awelzel

For reference, we've also had a request for uint128 (but no ticket yet I believe).

That said, I don't think I want to do arbitrary bit-width integers. That'd be lot of work (in particular making sure they interact nicely with current std sizes, and performance doesn't suffer), with not that much actual demand afaics.

but having a short-cut to parse 3 bytes into an integer.

I would indeed be reluctant to introduce a full uint24 as well, but could see a short-cut for unpacking it. Do you have an idea for syntax? It doesn't fit nicely with the current unpack unfortunately as that expects a known target type (see https://docs.zeek.org/projects/spicy/en/latest/programming/language/packing.html#packing).

Dec 02 '24 09:12 rsmmr

Do you have an idea for syntax?

Grml, I guess my hope was to allow just x: uint24 as a field specification, but then promoting it to uint32 behind the scenes. But yeah, that's not "clean" if there's no actual uint24 type.

Maybe some lifted bitfield'ish notation instead?

   x: uint24;  # for comparison
   x: uint32:24`  # parse 24bits (number needs to be a multiple of 8 and less-or-equal of given type) with the current endian setting into a uint32
   x: uint32::24`
   x: uint32[24]`
   x: uint32{24}`
   x: uint32 &width=24  # Benjamin mentioned something like that and it might just be the best fit outside of having uint24.

Dec 02 '24 09:12 awelzel

Do you have an idea for syntax?

Maybe some lifted bitfield'ish notation instead?

The "bitfield'ish notation" looks like a new thing to me, would maybe an optional &width attribute which takes the number of bits (bytes?) for parsing of integers integrate better, e.g.,

: uint8 &width=8;
# equivalent to
: uint8;

or for the original issue

transaction_id: uint32 &width=24;

I think this should also integrate nicely with &byte-order.

EDIT:

Actually we already have &size (which takes number of bytes). Currently we require that an integer consumes all bytes specified ~in that attribute~ for its "natural width", but we might be able to relax that, e.g.,

: uint32 &size=3; # ParseError: expecting 4 bytes for unpacking value (3 bytes available)

Dec 02 '24 09:12 bbannier

: uint32 &size=3; # ParseError: expecting 4 bytes for unpacking value (3 bytes available)

I like that, that looks most natural to me. The size couldn't be more than sizeof(type), but anything smaller would be fine. Well, except that I'm not sure how byte order would be applied in general ...

Dec 02 '24 10:12 rsmmr

I really like, &size, too!

Well, except that I'm not sure how byte order would be applied in general ...

Wouldn't filling the int from left to right and right-shift by (sizeof(type)-&size)*8 bits ~before~ after byte-order-flipping be reasonable?

Dec 02 '24 12:12 awelzel