LXMF `LXMessage.unpack`: check if `title` and `content` are byte arrays or strings

If a the msgpacked portion of an LXMessage uses strings (fixstr tags 0xa0 to 0xbf, and probably str_8 (0xd9), str_16 (0xda), and str_32 (0xdb) as well) instead of byte arrays, LXMF will happily unpack them but then crash when trying to decode the string. For example:

0090: 94 CB 41 D9 EF DB 0D 00  00 00 AC 48 65 6C 6C 6F  ..A........Hello
00a0: 2C 20 6C 78 6D 66 21 A0  C0                       , lxmf!..

(Side note, is there a reason to LXMF uses byte arrays in the msgpack instead of strings? Like, does msgpack not specify UTF-8 as the string format or something?)

This PR changes unpack to have logic similar to LXMessage.__init__, where it will always decode it into UTF-8 byte array based on the type.

Feb 26 '25 20:02 leroycep

Content and titles should never be strings, but always binary data in UTF-8 encoding. To support any language and character set, LXMF only supports UTF-8 for parts that are intended for human readability.

This should have been made more explicit, though, and I think the right way to go about this is to not accept messages where the byte arrays of title and content cannot be decoded as UTF-8. It should definitely not crash, of course ;)

As you can see in set_content_from_string() and set_title_from_string(), the passed string values are encoded to UTF-8, and then set on LXMessage instance. These helper methods exist for message creation. Over the wire, there should never be any string data types in those parts of the message-packed data structure.

Feb 27 '25 11:02 markqvist

Quoting the msgpack spec:

  * **String** extending Raw type represents a UTF-8 string

It seems like msgpack defines the string type to mean a UTF-8 encoded byte array. That implies that the implementation should be switched to use only msgpack strings and not byte array, no?

Feb 27 '25 15:02 leroycep