lua-struct
lua-struct copied to clipboard
Supporting wide character (UTF-16) strings
I faced the need to unpack zero-terminated strings where each character is 2 bytes in big-endian (hi-lo) order, even as most all strings are ASCII range. So what i added to unpack is:
elseif opt == 'S' then -- wide-character string, hi-lo
local str = ''
while true do
local wch = stream:byte(iterator) + 256 * stream:byte(iterator + 1)
iterator = iterator + 2
if wch == 0 then
break
end
str = str .. (wch < 128 and string.char(wch) or '~')
end
table.insert(vars, str)
elseif
This is the most controversial/unfinished of my mods, since it assumes little-endian encoding (many apps do lo-hi, even as the default per RFC-2781 is big endian - see https://en.wikipedia.org/wiki/UTF-16#Byte_order_encoding_schemes ). In addition i don't check for https://en.wikipedia.org/wiki/Byte_order_mark .
Nor do i handle correctly code points over 255. Which is a puzzle, how to correctly handle that in Lua? I am guessing the right thing would be to convert to UTF-8 for the internal string (which matches ASCII for <128). In any case - not production ready but existing need.
@EnTerr Hello, did you add anything else too ? I added your code to the unpack function by still can't decode utf 16 strings...
Not a solution, but in Lua, UTF code points are simply split into 8-bit values. What they are depends on the UTF encoding you're using (UTF-8/UTF-16BE/UTF-16LE), so you just have to know what you're working with to anticipate the byte order in the string
.