don't garble partial multi-byte character after control sequence
When use lf to list files, emacs-libvterm may read partial multi-byte character, for example:
$ echo -n '招聘' | hexdump -C 00000000 e6 8b 9b e8 81 98
; get "招", control sequence and partial character (vterm--filter process "\xE6\x8B\x9B\e[14;111H\xE8")
; now full "聘" (vterm--filter process "\x81\x98")
This will send "\xE8" to libvterm which is not a full character.
This looks good as far as I can tell (so I'd go ahead and merge it), but can you please explain a little bit more what's happening here?
This looks good as far as I can tell (so I'd go ahead and merge it), but can you please explain a little bit more what's happening here?
(vterm--filter process "\xE6\x8B\x9B\e[14;111H\xE8")
(vterm--filter process "\x81\x98")
It translates to calls:
; write "招", ok
(vterm--write-input vterm--term (decode-coding-system "\xE6\x8B\x9B" locale-coding-system t))
; move cursor position to line 14 column 111, ok
(vterm--write-input vterm--term ("\e[14;111H"))
; write UTF-8 encoded string "\xC0\xE8", BAD
(vterm--write-input vterm--term (decode-coding-system "\xE8" locale-coding-system t))
; write UTF-8 encoded string "\xC0\x81\xC0\x98", BAD
(vterm--write-input vterm--term (decode-coding-system "\x81\x98" locale-coding-system t))
The last two calls to vterm--write-input:
Fvterm_write_input(env, nargs, args, data)
len = string_bytes(env, args[1]);
env->copy_string_contents(env, args[1], NULL, &size);
module_copy_string_contents(env, args[1], NULL, len);
lisp_str_utf8 = encode_string_utf_8(lisp_str, Qnil, true, Qnil /* HANDLE-8-BIT */, Qnil);
/* len is 0 now !!! */
env->copy_string_contents(env, args[1], bytes, &len);
vterm_input_write(term->vt, bytes, len); // zero bytes !!!
Because HANDLE_8_BIT is Qnil, encode_string_utf_8 returns NULL for "\xC0\xE8" and "\xC0x81\xC0\x98",
the character "聘"(\xE8\x81\x98) is thrown away.
This patch buffers the trailing "\xE8" for next call to vterm--filter to form a valid full UTF-8 character. Actually the original code considers partial multi-bytes character, but it has an off-by-one error.
@Sbozzolo @jixiuf could you merge this?