kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

When using string literals in equality operators, should the size affect whether the equality is successful?

Open humdogm opened this issue 3 years ago • 2 comments

I am testing with the below ksy definitions and the attached text file.

meta:
  id: try
  
seq:
  - id: thing
    type: str
    encoding: UTF-16LE
    size: 20
    
instances:
  is_last_parti:
    value:
      thing == "last_parti"

last_parti.txt

When size is set to 20, the value instance is true. If the size is set to 22 or higher, the value instance is false even though the string does not continue past 20.

Also when size=22, making the string into "last_parti\0\0" makes it false, but making it "last_parti\0" is true even though I only specified 1 byte from octal? It's true when I do "last_parti\u0000", as expected. But for longer strings its a PITA to keep appending \u0000 at the end. UTF-16 doesn't work with terminators otherwise I'd use that.

If the string does not use the whole buffer defined for it should string comparison with a literal fail?

humdogm avatar Sep 23 '21 19:09 humdogm

Have you tried strz instead of str?

KOLANICH avatar Sep 23 '21 21:09 KOLANICH

@KOLANICH OP said that the string is in UTF-16, so they can't use strz because of #187.

@humdogm

When size is set to 20, the value instance is true. If the size is set to 22 or higher, the value instance is false even though the string does not continue past 20.

With type: str, there is no zero-termination - the string is always exactly as long as the size says. (In this case, because it's UTF-16, size: 20 gives a string of length 10, size: 22 gives length 11, etc.) If there are zero bytes at the end, those are treated like any other character and stored in the final string.

So if you use size: 20, Kaitai Struct will read the string "last_parti", and with size: 22 it will read "last_parti\0". Kaitai Struct strings in memory have an explicit length and are not zero-terminated, so these two strings are not equal, because they have different lengths.

Normally you can use strz instead of str to read a zero-terminated string from a file. In that case, if there is a zero terminator before the size is reached, the string ends early and all zeroes at the end are ignored. But as you've already found out, strz is currently broken for UTF-16.

Also when size=22, making the string into "last_parti\0\0" makes it false, but making it "last_parti\0" is true even though I only specified 1 byte from octal? It's true when I do "last_parti\u0000", as expected.

Kaitai Struct strings are always Unicode, so \0 is actually the same as \u0000. Both escapes stand for a single Unicode code point, U+0000.

Your string in the file is stored as UTF-16, so for every two zero bytes in the file you get one U+0000 code point in the final string. This is why with size: 22 you get "last_parti\0" and not "last_parti\0\0".

dgelessus avatar Sep 23 '21 23:09 dgelessus