Zenroom icon indicating copy to clipboard operation
Zenroom copied to clipboard

Support UTF8 in all string manipulation functions

Open jaromil opened this issue 3 years ago • 1 comments

HEAP buffer contents (values) should support UTF8 and adopt UTF8 compliant string functions when modifying string data (see CODEC).

the CODEC itself may inlclude utf8 or even utf16 property to classify such memory contents.

jaromil avatar Aug 02 '22 14:08 jaromil

Verified that UTF8 is already supported through all default Lua functions and OCTET conversion to/from string:

[*] Interactive console, press ctrl-d to quit.
print'⭐'
⭐
[*] Script successfully executed
ut = '⭐'
[*] Script successfully executed
print(ut)
⭐
[*] Script successfully executed
u = O.from_string('⭐')
[*] Script successfully executed
print(u)
4q2Q
[*] Script successfully executed
print(u:string())
⭐
[*] Script successfully executed

Only string manipulation functions (strtok, split...) should be re-implemented to support utf8

This project may come handy: https://github.com/sheredom/utf8.h

jaromil avatar Aug 13 '22 07:08 jaromil