node-pty
node-pty copied to clipboard
Several strings in win\pty.cc are converting wide strings to narrow strings
From @rprichard in https://github.com/Tyriar/node-pty/pull/103#discussion_r126276801
I'm not 100% sure, but I think the initializer of
msg_converts the wide string to a narrow string by truncating each wchar_t value to a char. It will only be correct for ASCII characters. I'm guessing node.js and/or v8 want UTF-8 strings? Standard C++ has a way to do that conversion (as of C++11, I think -- I'm thinking of std::codecvt), and maybe node.js / v8 also does? Windows has C-style APIs to do the conversion.I think the pattern already existed in the code, so it's probably not necessary to fix it in this PR? e.g. see
shellpath_'s initializer.
@Tyriar Imho the conversion is not needed at all since Windows uses internally a type of UTF-16 for wchar_t and so does Javascript (not sure though if they agree on all parts under all circumstances as there are several flavors of UTF-16).
Untested:
wchar_t *some_wstring;
...
Nan::MaybeLocal<v8::String> error = Nan::New<v8::String>(
(uint16_t *) some_wstring,
wcslen(some_wstring));
Nan::ThrowError(error.ToLocalChecked());
@jerch I did something like that but I wasn't sure what the best way to test it was.
@Tyriar The official docs state the following:
- Windows: UTF-16 (see https://msdn.microsoft.com/en-us/library/dd374081.aspx), this source does not say anything about endianess, on a typical x86 Windows uses LE, probably they just use the arch endianess
- Javascript: UTF-16 or UCS-2 (see http://es5.github.io/x2.html and https://mathiasbynens.be/notes/javascript-encoding), again no endianess is predefined, I guess they simply use the arch endianess again
V8 developer doc mentions UTF-16 for method v8::String::NewFromTwoByte (http://bespin.cz/~ondras/html/classv8_1_1String.html#a876615eb027092a6a71a4e7d69b82d00) - seems to be safe to just cast the wchar_t* to uint16_t*.
And for testing - hmm, to see if all wchar_ts get correctly mapped and understood in JS maybe you can use the codePointAt() method in JS and compare the values with the codepoint number from C++.
Btw the new for (xy in s) and the Array.from(s) in JS act on codepoints and not on charcodes anymore, which is helpful to get an iteration on those running. String.length still maps to the charcode length though.