kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

Bit-sized integers allow strange syntax

Open Mingun opened this issue 2 years ago • 4 comments

This definitions is valid and it seems they should be forbidden:

seq:
  - id: zeroes
    type: b00000008 # == b8
  - id: zero
    type: b0 # should be forbidden
  - id: unnecessary_be
    type: b1be
  - id: unnecessary_le
    type: b1le

Mingun avatar Dec 18 '21 15:12 Mingun

No, it doesn't seem they must be forbidden.

KOLANICH avatar Dec 19 '21 11:12 KOLANICH

I'm with @KOLANICH here, I think that the first two are totally OK.

Some people may prefer padding numbers to the right to achieve some look, for example:

seq:
  - type: b01
  - type: b53

I don't see anything wrong with that, and I don't think that forbidding it brings something positive.


The following

  - id: zero
    type: b0

is also legitimate. We allow size: 0, which may also be useful (as well as repeat-expr: 0, etc.), so why not type: b0.

One use that comes to my mind is when you want to force the compiler to insert an _io.alignToByte() call without actually parsing anything - that could look like this:

 - type: b0
 - size: 0

Sure, it's a bit of a hack, but a potentially useful one :)


This situation

 - id: unnecessary_be
   type: b1be
 - id: unnecessary_le
   type: b1le

is mentioned in the User Guide (https://doc.kaitai.io/user_guide.html#bit-endian):

Big-endian and little-endian bit integers can follow only on a byte boundary. They can’t share the same byte. Joining them on an unaligned bit position is undefined behavior, and future versions of KSC will throw a compile-time error if they detect such a situation.

It is definitely planned to make the compiler detect this, see https://github.com/kaitai-io/kaitai_struct/issues/155#issuecomment-718239225.

generalmimon avatar Dec 19 '21 12:12 generalmimon

I don't see anything wrong with that, and I don't think that forbidding it brings something positive

That seems valuable, but it is inconsistent with that fact that s01, u0008 and so on is not a synonymous to the s1 and u8. Such differences in behaviour give the impression of poor language design.

We allow size: 0, which may also be useful (as well as repeat-expr: 0, etc.), so why not type: b0.

I believe that it is only because nobody thought about that corner cases (what is the fact that negative values are also not prohibited). In my opinion we should investigate possible usages of that strange definitions and forbid them if nothing will be found. If they can be used for some workarounds it is better to investigate what the workaround tries to do and introduce a dedicated syntax for that.

It is definitely planned to make the compiler detect this,

This example is not about two fields following each other, they put to the one type only as an example of definitions. Treat the type as containing only one of the specified fields. What the means to define a bit-order of one-bit field?

Mingun avatar Dec 19 '21 15:12 Mingun

@Mingun:

What the means to define a bit-order of one-bit field?

I highly recommend reading https://doc.kaitai.io/user_guide.html#_bit_sized_integers.

Note that in the context of Kaitai Struct, the term bit-endian dictates the "parsing start and direction" (in this case, the be/le suffix of b1 affects the position where it shall parsed within the enclosing byte in the stream - if it starts on the stream byte's most significant bit (bit-endian: be) or the least significant bit (bit-endian: le)).

It does not mean bit numbering of the parsed bits (in fact, that is constant for both bit-endian: be/le - notice that on all diagrams in Bit-sized integers (User Guide), the integer bits are always in order a4, a3, ..., a0, regardless of whether big/little-endian parsing direction is used).

tl;dr:

seq:
  - id: p
    type: b1be
             d[0]
7   6   5   4   3   2   1   0
p0
┬─
p

a)

Offset (h)  00        01        ...

               8e16
00000000    100011102
└─ p = true

b)

Offset (h)  00        01        ...

               7f16
00000000    011111112
└─ p = false

seq:
  - id: q
    type: b1le
             d[0]
7   6   5   4   3   2   1   0
                            q0
                            ┬─
                            q

a)

Offset (h)  00        01        ...

               8e16
00000000    100011102
└─ q = false

b)

Offset (h)  00        01        ...

               0116
00000000    000000012
└─ q = true

generalmimon avatar Dec 19 '21 16:12 generalmimon