kaitai_struct
kaitai_struct copied to clipboard
Bit-sized integers allow strange syntax
This definitions is valid and it seems they should be forbidden:
seq:
- id: zeroes
type: b00000008 # == b8
- id: zero
type: b0 # should be forbidden
- id: unnecessary_be
type: b1be
- id: unnecessary_le
type: b1le
No, it doesn't seem they must be forbidden.
I'm with @KOLANICH here, I think that the first two are totally OK.
Some people may prefer padding numbers to the right to achieve some look, for example:
seq:
- type: b01
- type: b53
I don't see anything wrong with that, and I don't think that forbidding it brings something positive.
The following
- id: zero
type: b0
is also legitimate. We allow size: 0
, which may also be useful (as well as repeat-expr: 0
, etc.), so why not type: b0
.
One use that comes to my mind is when you want to force the compiler to insert an _io.alignToByte()
call without actually parsing anything - that could look like this:
- type: b0
- size: 0
Sure, it's a bit of a hack, but a potentially useful one :)
This situation
- id: unnecessary_be
type: b1be
- id: unnecessary_le
type: b1le
is mentioned in the User Guide (https://doc.kaitai.io/user_guide.html#bit-endian):
Big-endian and little-endian bit integers can follow only on a byte boundary. They can’t share the same byte. Joining them on an unaligned bit position is undefined behavior, and future versions of KSC will throw a compile-time error if they detect such a situation.
It is definitely planned to make the compiler detect this, see https://github.com/kaitai-io/kaitai_struct/issues/155#issuecomment-718239225.
I don't see anything wrong with that, and I don't think that forbidding it brings something positive
That seems valuable, but it is inconsistent with that fact that s01
, u0008
and so on is not a synonymous to the s1
and u8
. Such differences in behaviour give the impression of poor language design.
We allow
size: 0
, which may also be useful (as well asrepeat-expr: 0
, etc.), so why nottype: b0
.
I believe that it is only because nobody thought about that corner cases (what is the fact that negative values are also not prohibited). In my opinion we should investigate possible usages of that strange definitions and forbid them if nothing will be found. If they can be used for some workarounds it is better to investigate what the workaround tries to do and introduce a dedicated syntax for that.
It is definitely planned to make the compiler detect this,
This example is not about two fields following each other, they put to the one type only as an example of definitions. Treat the type as containing only one of the specified fields. What the means to define a bit-order of one-bit field?
@Mingun:
What the means to define a bit-order of one-bit field?
I highly recommend reading https://doc.kaitai.io/user_guide.html#_bit_sized_integers.
Note that in the context of Kaitai Struct, the term bit-endian
dictates the "parsing start and direction" (in this case, the be
/le
suffix of b1
affects the position where it shall parsed within the enclosing byte in the stream - if it starts on the stream byte's most significant bit (bit-endian: be
) or the least significant bit (bit-endian: le
)).
It does not mean bit numbering of the parsed bits (in fact, that is constant for both bit-endian: be/le
- notice that on all diagrams in Bit-sized integers (User Guide), the integer bits are always in order a4
, a3
, ..., a0
, regardless of whether big/little-endian parsing direction is used).
tl;dr:
seq:
- id: p
type: b1be
d[0]
7 6 5 4 3 2 1 0
p0
┬─
p
a)
Offset (h) 00 01 ... 8e16 00000000 100011102
└─ p = true
b)
Offset (h) 00 01 ... 7f16 00000000 011111112
└─ p = false
seq:
- id: q
type: b1le
d[0]
7 6 5 4 3 2 1 0
q0
┬─
q
a)
Offset (h) 00 01 ... 8e16 00000000 100011102
└─ q = false
b)
Offset (h) 00 01 ... 0116 00000000 000000012
└─ q = true