binstruct icon indicating copy to clipboard operation
binstruct copied to clipboard

Support Bit Packed Bools

Open MiddleMan5 opened this issue 1 year ago • 5 comments

Hey, I've been using this library and it's great!

One limitation I ran into using it is that the binary format I'm working with encodes bools as single bits instead of an entire byte. Maybe offer a mode to extract individual bits out as bools?

MiddleMan5 avatar Jun 13 '24 00:06 MiddleMan5

Hi, thanks for your interest in the project!

I thought about this, but there might be a problem, what to do with the remaining bits? Just ignore them?

Now, as an option, you can create your own enum type with bit flags. You can also add methods to the new type to more conveniently obtain bool values.

type MyDataFlags uint8

const (
	MyDataFlag1 MyDataFlags = 1 << iota
	MyDataFlag2
	MyDataFlag3
	MyDataFlag4
	MyDataFlag5
	MyDataFlag6
	MyDataFlag7
	MyDataFlag8
)

func (f MyDataFlags) HasFlag1() bool {
	return f&MyDataFlag1 != 0
}

func (f MyDataFlags) HasFlag2() bool {
	return f&MyDataFlag2 != 0
}

// etc...

type MyData struct {
	Flags MyDataFlags
}

func main() {
	data := []byte{0b01010101}
	var actual MyData
	err := binstruct.UnmarshalBE(data, &actual)
	if err != nil {
		log.Fatal(err)
	}

	println(actual.Flags & MyDataFlag1) // 1
	println(actual.Flags & MyDataFlag2) // 0

	// or

	println(actual.Flags.HasFlag1()) // true
	println(actual.Flags.HasFlag2()) // false
}

ghostiam avatar Jun 13 '24 13:06 ghostiam

Thanks for the example code! Yeah that's basically the approach we're taking now. The downside is we have to keep track of a "bit offset" manually externally to the code which is kind of a pain. Bools being packed as single bits is pretty common in a lot of the binary formats I've worked with, so I think this is a valid use case.

One possible implementation would involve tracking the total bit offset instead of byte offset internally. As you pop off byte aligned chunks to decode from the stream this offset would get incremented * 8. When decoding you could check that the current offset was a multiple of 8 bits and if so retain the current logic.

Decoding bit-packed bool fields would be a little different; when popping off a bit the bit offset would become a non-multiple of 8 bits. Any subsequent operations would need to read in the correct number of bits from the stream and re-align them to the data type.

I think offering a "read n bits" function would also be helpful to allow the user to choose to drop the remaining bits.

Underflow logic would remain the same aligned vs. unaligned. If you try to read a byte and only 7 bits remain then a regular underflow error occurs (user or input is wrong)

I might have time to open up an example PR if you're interested, what do you think?

MiddleMan5 avatar Jun 17 '24 19:06 MiddleMan5

Thanks for the detailed description! Yes, I would be glad to see an example. Is there an example of some open/popular protocol/data using bit offset?

I think we can add such functionality, but we need to have functionality for explicit transition to unaligned (maybe a tag "read bits and remain unaligned", like bits:3,unalign), and return to aligned mode, which will discard unread bits. Ideally, we'd add an option to NewDecoder/Unmarshal, but the library doesn't support it at the moment :( (Improvement in plans).

One possible implementation would involve tracking the total bit offset instead of byte offset internally.

I think I would still stay with byte offset, but add additional fields to control the bit shift. As long as the bit shift is 0, we can use the same logic as we have now. But I'll think about it some more.

ghostiam avatar Jun 18 '24 07:06 ghostiam

I think I like the idea of controlling shifting between unaligned and aligned access modes and discarding bits on the transition yeah!

The codebase I've been working with most recently that provides generic binary encoding/decoding approaches supporting non-byte-aligned codecs is openc3. The codebase is written in python/ruby and is by no means easy to read but I'll include it here for reference. Specifically the packet item accessors that handle converting from input fields to binary representations: https://github.com/OpenC3/cosmos/blob/a3f4b9a3ccb9097fd9d1a73645886ad60c83f754/openc3/python/openc3/accessors/binary_accessor.py#L157

The protocols I work with unfortunately are proprietary, but the ones I know of that are public that allow for non-byte aligned packing off the top of my head are:

  • Cap'n Proto - https://github.com/capnproto/capnproto
  • Thrift - https://github.com/apache/thrift

MiddleMan5 avatar Jun 19 '24 00:06 MiddleMan5

I thought about what functionality is needed to introduce bit-offset:

  • BE/LE tags should behave like MSB/LSB when working with bits (or add explicit aliases?);
  • Support for signed numbers when converting from bits;
  • What to do with offset? it's in bytes. Reset to aligned mode?
  • I think it’s worth reading several bytes at once into a buffer like uint64 (reading only 7 bytes, 1 byte will remain for shifted bits) to make it easier to shift bits in several bytes at a time.

I will add more as ideas and questions arise.

ghostiam avatar Jun 20 '24 20:06 ghostiam