kaitai_struct
kaitai_struct copied to clipboard
When using string literals in equality operators, should the size affect whether the equality is successful?
I am testing with the below ksy definitions and the attached text file.
meta:
id: try
seq:
- id: thing
type: str
encoding: UTF-16LE
size: 20
instances:
is_last_parti:
value:
thing == "last_parti"
When size is set to 20, the value instance is true. If the size is set to 22 or higher, the value instance is false even though the string does not continue past 20.
Also when size=22, making the string into "last_parti\0\0" makes it false, but making it "last_parti\0" is true even though I only specified 1 byte from octal? It's true when I do "last_parti\u0000", as expected. But for longer strings its a PITA to keep appending \u0000 at the end. UTF-16 doesn't work with terminators otherwise I'd use that.
If the string does not use the whole buffer defined for it should string comparison with a literal fail?
Have you tried strz
instead of str
?
@KOLANICH OP said that the string is in UTF-16, so they can't use strz
because of #187.
@humdogm
When size is set to 20, the value instance is true. If the size is set to 22 or higher, the value instance is false even though the string does not continue past 20.
With type: str
, there is no zero-termination - the string is always exactly as long as the size
says. (In this case, because it's UTF-16, size: 20
gives a string of length 10, size: 22
gives length 11, etc.) If there are zero bytes at the end, those are treated like any other character and stored in the final string.
So if you use size: 20
, Kaitai Struct will read the string "last_parti"
, and with size: 22
it will read "last_parti\0"
. Kaitai Struct strings in memory have an explicit length and are not zero-terminated, so these two strings are not equal, because they have different lengths.
Normally you can use strz
instead of str
to read a zero-terminated string from a file. In that case, if there is a zero terminator before the size
is reached, the string ends early and all zeroes at the end are ignored. But as you've already found out, strz
is currently broken for UTF-16.
Also when size=22, making the string into "last_parti\0\0" makes it false, but making it "last_parti\0" is true even though I only specified 1 byte from octal? It's true when I do "last_parti\u0000", as expected.
Kaitai Struct strings are always Unicode, so \0
is actually the same as \u0000
. Both escapes stand for a single Unicode code point, U+0000.
Your string in the file is stored as UTF-16, so for every two zero bytes in the file you get one U+0000 code point in the final string. This is why with size: 22
you get "last_parti\0"
and not "last_parti\0\0"
.