virgil
virgil copied to clipboard
UTF-8 string literals
Does Virgil support UTF-8 string literals?
The documentation suggests it does: https://github.com/titzer/virgil/blob/3038dead280099b736f312e2b091b053cb0cfbf7/doc/lib-issues.txt#L116
Here I've inserted the copyright character in a string literal:
$ cat hello.v3
def main() {
System.puts("Hello World ©\n");
}
$ virgil run tmp/hello.v3
[tmp/hello.v3 @ 2:21] ParseError: invalid string literal
System.puts("Hello World ©\n");
^
Hex byte values work though:
$ cat hello.v3
def main() {
System.puts("Hello World \xC2\xA9\n");
}
$ virgil run hello.v3
Hello World ©
You're right, that's a bug. It should handle UTF-8 in string literals, but it does not yet.
I was planning on improving the support for unicode by changing the string
type (currently an alias for Array<byte>
), but this is something that could maybe supported by just allowing the UTF-8 representation through.
Thanks.
A workaround is to convert UTF-8 strings to hex byte values with, for example:
$ echo -n "Hello World ©" | od -A n -t x1 | tr -d '\n' | sed 's/ /\\x/g'
\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x20\xc2\xa9