WriteConsoleOutputW doesn't work with wide chars and surrogate pairs in Windows Terminal
Windows Terminal version
1.22.11141.0
Windows build number
10.0.26100.0
Other Software
cmd.exe, conhost.exe and Windows Terminal
Steps to reproduce
Run this code in cmd or conhost and compare using with Windows Terminal. The output are different in the Windows Terminal
#include <windows.h>
#include <iostream>
int main() {
// 1) Enable UTF-8 output so Unicode glyphs display correctly
if (!SetConsoleOutputCP(CP_UTF8)) {
std::cerr << "SetConsoleOutputCP failed\n";
return 1;
}
// 2) Get handle and current buffer info
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_SCREEN_BUFFER_INFO csbi;
if (!GetConsoleScreenBufferInfo(hConsole, &csbi)) {
std::cerr << "GetConsoleScreenBufferInfo failed\n";
return 1;
}
SHORT cursorX = csbi.dwCursorPosition.X;
SHORT cursorY = csbi.dwCursorPosition.Y + 1;
// 3) Prepare 2ร4 CHAR_INFO buffer for our two lines
const SHORT width = 4, height = 2;
CHAR_INFO buffer[height][width] = {};
// Line 1: '็ณ' ยท '1'
buffer[0][0].Char.UnicodeChar = L'\u7CCA';
buffer[0][1].Char.UnicodeChar = L'\0';
buffer[0][2].Char.UnicodeChar = L' ';
buffer[0][3].Char.UnicodeChar = L'1';
// Line 2: ๐จ (surrogate pair) ยท ' ' ยท '2'
buffer[1][0].Char.UnicodeChar = 0xD83D; // High-surrogate
buffer[1][1].Char.UnicodeChar = 0xDC68; // Low-surrogate
buffer[1][2].Char.UnicodeChar = L' ';
buffer[1][3].Char.UnicodeChar = L'2';
// Fill attributes (white on default background)
WORD white = FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE;
for (int y = 0; y < height; ++y)
for (int x = 0; x < width; ++x)
buffer[y][x].Attributes = white; // :contentReference[oaicite:7]{index=7}
// 4) Define output region: width=4, height=2, at (cursorX, cursorY-1)
SMALL_RECT writeRegion = {
cursorX,
static_cast<SHORT>(cursorY - 1),
static_cast<SHORT>(cursorX + width - 1),
static_cast<SHORT>(cursorY - 2 + height)
};
COORD bufSize = { width, height };
COORD bufCoord = { 0, 0 };
// 5) Write our 2ร4 block into the console buffer
if (!WriteConsoleOutputW(
hConsole,
reinterpret_cast<CHAR_INFO*>(buffer),
bufSize,
bufCoord,
&writeRegion
)) {
std::cerr << "WriteConsoleOutputW failed\n";
return 1;
}
// 6) Compute the new cursor position:
COORD newPos;
newPos.X = 0; // or wherever you want the prompt to start
newPos.Y = static_cast<SHORT>(cursorY + 1); // exactly three lines below your original cursor
// 2) Move the cursor there:
SetConsoleCursorPosition(hConsole, newPos);
// 7) Now return โ the next thing you see will be the shell prompt at newPos.
return 0;
}
Expected Behavior
Correct with cmd and conhost:
Actual Behavior
Wrong with Windows Terminal:
In the first line it add a space after the ็ณ and in the second line it add 2 ReplacementChar and doesn't check if the first char is a high surrogate which could be a surrogate pair.
We've found some similar issues:
- #10287 , similarity score: 84%
- #10810 , similarity score: 84%
- #10055 , similarity score: 83%
If any of the above are duplicates, please consider closing this issue out and adding additional context in the original issue.
Note: You can give me feedback by ๐ or ๐ this comment.
This is related with WriteConsoleOutputW and the other open issue is related with ReadConsoleOutputW.
For row 1: Inserting half of a wide character into a single cell is unspecified. You are not supposed to do that. Well-formed applications do not do so. :)
That is just for legacy terminals which they can interpret that if a cell has null char then output an empty string letting the wide char on the left output the wide char.
But you aren't inserting a null character. You're inserting a space!
Anyway, WriteConsoleOutputCharacter[AW] is for legacy consoles that only support UCS-2 (not surrogate pairs) anyway, so it is fitting that the issues it has are issues for legacy consoles. ๐
But you aren't inserting a null character. You're inserting a space!
Sorry I already change the code. The result is the same.
Anyway,
WriteConsoleOutputCharacter[AW]is for legacy consoles that only support UCS-2 (not surrogate pairs) anyway, so it is fitting that the issues it has are issues for legacy consoles. ๐
But with the legacy consoles are working as expected. ๐
I'm the person who advocated for and made the change that broke your code. This happened in #17510.
As you may already know, historically the console on Windows was essentially nothing but a
CHAR_INFO buffer[height][width];
with a GDI window attached to it. The way WriteConsoleOutput worked, was that it literally just iterated over the given area and copied your given CHAR_INFOs over. It didn't even do any input validation beyond that. This allowed you to smuggle hidden data into the console buffer that the user couldn't see, among others. (For what it's worth, this is not a security issue IMO. I mention it as it highlights how "direct" the console APIs were.)
This API design fundamentally conflicts with modern Unicode, with its complex languages, combining marks, or emojis. Newer versions of both Windows Terminal and the old console store text like this (figuratively speaking):
std::wstring text[height]; // each row can be an arbitrarily long string
WORD attributes[height][width];
There's no matrix of cells anymore and in fact it can't have one: Those combining marks and emojis can easily rack up 20+ characters for a single single cell. This makes dynamic width rows necessary.
But even just surrogate pairs already break the CHAR_INFO model. You can't store a surrogate pair in a single column with these APIs. You also can't read or write them using non-CHAR_INFO-APIs if you did store them with 2 CHAR_INFOs. The entire Console API has always been inconsistently broken when it came to anything beyond UCS2.
This is why I have continuously worked towards a future where all CHAR_INFO APIs strictly support only UCS2 (no surrogate pairs) or DBCS (ShiftJIS, etc.), just like how it worked up until roughly Windows 10 1607. #17510 is just one more step towards this and in the future your code will also stop working in the old console window. To write surrogate pairs you must use non-CHAR_INFO APIs.
You may say that I'm breaking compatibility with existing applications, and you'd be correct. I'm carefully breaking some situations, in order for Windows overall being able to move towards a better future. A future with more features (full grapheme cluster support in 1.22), better performance (~25x at this point!) and less bugs (can't even count them anymore). Arguably that's one of the largest user asks for Windows and has been for a long time. Strictly speaking, it also wouldn't break "legacy" applications per-se, because those predate the addition of surrogate pairs to the console IMO.
Leonard wrote:
and in the future your code will also stop working in the old console window
@BDisp If for some reason you need such functionality, you could play with vtm (it's still under development, but just in case). Run vtm -r term to run its built-in terminal.
It can output independent cells, even if they are halves of wide characters (discussed in #4345). In vtm, your code will output half of the hieroglyph, but with a minor edit:
// Line 1: '็ณ' ยท '1'
buffer[0][0].Char.UnicodeChar = L'\u7CCA';
buffer[0][1].Char.UnicodeChar = L'\u7CCA';
buffer[0][2].Char.UnicodeChar = L' ';
buffer[0][3].Char.UnicodeChar = L'1';
you will get the full character:
Including the same wide character in both halves is what is required for this to operate as expected on all versions of the Windows Console as well.
I really appreciated all yours feedback and I understand that new features can cause break changes. Thus, I decided to use, for virtual terminal sequences that are disabled, WriteConsoleW and WriteConsoleOutputAttribute which work perfectly with cmd, conhost and Windows Terminal. I know that it's limited to 16 colors but with wide char and surrogate pair support. Do you want I close this issue? Thanks.
(To be clear, I meant my comment about "your code will also stop working in the old console window" specifically regarding surrogate pairs. Most wide characters are in the BMP and thus regular UCS2.)
WriteConsoleOutputAttribute is actually also a CHAR_INFO-related API. For instance, it technically allows you to colorize halves of wide cells (we just don't support that right now as a technical limitation).
Because of this, I actually expected it to not work for colorizing surrogate pairs in Windows Terminal 1.22. Does that really work for you?
If you don't mind, can you tell us more about the software you're developing and what versions of Windows you're targeting? My understanding was that proper support for surrogate pairs was added to conhost (the console) only after it got initial support for VT sequences in Windows 10 10586 (in 2015). This would mean that either CHAR_INFOs are fine to use as-is, because you don't need surrogate pairs, or that you can use VT sequences together with surrogate pairs.
I wouldn't close this issue just yet. Even though I wrote all the above, my goal is still to give you a proper solution. ๐
Thanks for all your feedback. WriteConsoleOutputAttribute really don't work. I wasn't testing with colors but only using the console default attributes. Setting custom foreground colors still work but also setting background colors it only print in the first line and Windows Terminal don't persist them. Thus, the work around is to use SetConsoleTextAttribute which really does what it's expected and work fine with wide chars and with surrogate pairs in all console. Below is the output and my changed code for all this work. I hope at least this doesn't broke in the future. I know that using VT sequences works better but my solution is for consoles that don't use VT sequences and allow to deal with wide chars and surrogate pairs with 16 colors.
Here is the output:
- cmd and conhost:
- Windows Terminal:
Here is my current code:
#include <windows.h>
#include <iostream>
#include <vector>
#include <string>
#include "NativeExports.h"
bool IsVirtualTerminalEnabled(HANDLE hConsole) {
DWORD mode = 0;
if (!GetConsoleMode(hConsole, &mode))
return false; // Handle not valid or error
return (mode & ENABLE_VIRTUAL_TERMINAL_PROCESSING) != 0;
}
/// Ensures the console screen buffer is at least (minCols ร minRows).
/// Returns true on success, false on failure.
bool EnsureBufferSize(HANDLE hConsole, SHORT minCols, SHORT minRows) {
CONSOLE_SCREEN_BUFFER_INFO csbi;
if (!GetConsoleScreenBufferInfo(hConsole, &csbi)) return false;
// If buffer is already big enough, nothing to do
if (csbi.dwSize.X >= minCols && csbi.dwSize.Y >= minRows)
return true;
// Compute new size
COORD newSize = csbi.dwSize;
newSize.X = std::max<SHORT>(newSize.X, minCols);
newSize.Y = std::max<SHORT>(newSize.Y, minRows);
// Grow the buffer
if (!SetConsoleScreenBufferSize(hConsole, newSize))
return false;
return SetConsoleCursorPosition(hConsole, { 0, static_cast<SHORT>(newSize.Y) });
}
const int H = 3; // Height of the buffer
const int W = 4; // Width of the buffer
// Represents a contiguous run of characters sharing the same attribute
struct Run {
WORD attr; // Color attribute
std::wstring text; // UTF-16 text
};
// Splits a single row of CHAR_INFO cells into runs
std::vector<Run> BuildRunsForRow(const CHAR_INFO* rowBuf, int width) {
std::vector<Run> runs;
if (width <= 0) return runs;
int startCol = 0;
while (startCol < width && rowBuf[startCol].Char.UnicodeChar == L'\0') {
++startCol;
}
if (startCol >= width) {
runs.push_back({ 0, L"\n" });
return runs;
}
Run current{ rowBuf[startCol].Attributes, L"" };
int cellCount = 0;
for (int c = startCol; c < width; ++c) {
wchar_t ch = rowBuf[c].Char.UnicodeChar;
if (ch == L'\0') continue; // skip flags
// Check for surrogate pairs
if (0xD800 <= ch && ch <= 0xDBFF) { // High surrogate
if ((c + 1) < width) {
wchar_t ch2 = rowBuf[c + 1].Char.UnicodeChar;
if (0xDC00 <= ch2 && ch2 <= 0xDFFF) { // Low surrogate
if (rowBuf[c].Attributes != current.attr) {
runs.push_back(current);
current = { rowBuf[c].Attributes, L"" };
}
int codepoint = ((ch - 0xD800) << 10)
+ (ch2 - 0xDC00) + 0x10000;
// wcwidth on UTF-32 code point? fallback to wcwidth on ch,low
int w = GetWidth(codepoint);
cellCount += (w < 0 ? 1 : w);
// append both code units to text
current.text.push_back(ch);
current.text.push_back(ch2);
++c; // Skip the next cell
continue;
}
}
}
// normal char
if (rowBuf[c].Attributes != current.attr) {
runs.push_back(current);
current = { rowBuf[c].Attributes, L"" };
}
current.text.push_back(ch);
// Normal BMP character:
int w = GetWidth(ch); // 0,1,2 or -1
if (w < 0) w = 1; // fallback so nothing disappears
cellCount += w;
}
// if we consumed fewer cells than width, pad with spaces
while (cellCount < width) {
current.text.push_back(L' ');
++cellCount;
}
runs.push_back(current);
runs.push_back({ 0, L"\n" });
return runs;
}
// Example usage for all rows in a HรW buffer
void WriteBufferToConsole(CHAR_INFO buf[H][W], HANDLE hConsole, SHORT cursorX, SHORT cursorY) {
for (int r = 0; r < H; ++r) {
auto runs = BuildRunsForRow(buf[r], W);
COORD pos = { cursorX, static_cast<SHORT>(cursorY + r) };
SetConsoleCursorPosition(hConsole, pos);
DWORD written;
for (auto& run : runs) {
SetConsoleTextAttribute(hConsole, run.attr);
WriteConsoleW(
hConsole,
run.text.c_str(),
static_cast<DWORD>(run.text.size()),
&written,
nullptr
);
}
}
}
int main() {
// 1) Enable UTF-8 for Unicode
SetConsoleOutputCP(CP_UTF8);
// 2) Get console handle & cursor
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
if (IsVirtualTerminalEnabled(hConsole)) {
std::cout << "VT sequences are ENABLED.\n";
}
else {
std::cout << "VT sequences are DISABLED.\n";
}
CONSOLE_SCREEN_BUFFER_INFO csbi;
GetConsoleScreenBufferInfo(hConsole, &csbi);
// Store the original text attributes
WORD originalAttributes = csbi.wAttributes;
// Store the original cursor position
SHORT cursorX = csbi.dwCursorPosition.X;
SHORT cursorY = csbi.dwCursorPosition.Y;
// 3) Master CHAR_INFO buffer (2 rows ร 4 cols)
CHAR_INFO buf[H][W] = {};
// Line 1: ็ณ 1
buf[0][0].Char.UnicodeChar = L'\u7CCA'; // ็ณ occupies 2 columns
buf[0][1].Char.UnicodeChar = L'\0'; // flag to skip in Char-path only
buf[0][2].Char.UnicodeChar = L' ';
buf[0][3].Char.UnicodeChar = L'1';
// Line 2: ๐จ 2
buf[1][0].Char.UnicodeChar = 0xD83D; // High surrogate (๐จ) occupies 2 columns
buf[1][1].Char.UnicodeChar = 0xDC68; // Low surrogate
buf[1][2].Char.UnicodeChar = L' ';
buf[1][3].Char.UnicodeChar = L'2';
// Line 3: ๐ฝ 3
buf[2][0].Char.UnicodeChar = 0xD835; // High surrogate (๐ฝ) occupies 1 column
buf[2][1].Char.UnicodeChar = 0xDD3D; // Low surrogate
buf[2][2].Char.UnicodeChar = L' ';
buf[2][3].Char.UnicodeChar = L'3';
// Foreground color per cell
WORD fgColors[H][W] = {
{ FOREGROUND_RED | FOREGROUND_INTENSITY, FOREGROUND_GREEN | FOREGROUND_INTENSITY, FOREGROUND_BLUE | FOREGROUND_INTENSITY, FOREGROUND_RED | FOREGROUND_GREEN },
{ FOREGROUND_GREEN | FOREGROUND_BLUE, FOREGROUND_RED | FOREGROUND_BLUE, FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE, FOREGROUND_RED },
{ FOREGROUND_GREEN, FOREGROUND_BLUE, FOREGROUND_RED | FOREGROUND_GREEN, FOREGROUND_GREEN | FOREGROUND_BLUE }
};
// Background color per cell
WORD bgColors[H][W] = {
{ BACKGROUND_BLUE, BACKGROUND_RED, BACKGROUND_GREEN, BACKGROUND_BLUE | BACKGROUND_RED },
{ BACKGROUND_GREEN | BACKGROUND_BLUE, BACKGROUND_RED | BACKGROUND_BLUE, BACKGROUND_RED | BACKGROUND_GREEN | BACKGROUND_BLUE, BACKGROUND_RED },
{ BACKGROUND_GREEN, BACKGROUND_BLUE, BACKGROUND_RED | BACKGROUND_GREEN, BACKGROUND_GREEN | BACKGROUND_BLUE }
};
// Assign attributes with different foreground and background colors
for (int r = 0; r < H; ++r) {
for (int c = 0; c < W; ++c) {
// Simple fix: XOR foreground and background to ensure different values
if ((fgColors[r][c] & 0x0F) == (bgColors[r][c] >> 4)) {
fgColors[r][c] ^= FOREGROUND_INTENSITY;
}
buf[r][c].Attributes = fgColors[r][c] | bgColors[r][c];
}
}
// Suppose you know youโll write up to (cursorX + 5) columns and (cursorY + 2) rows:
EnsureBufferSize(hConsole,
static_cast<SHORT>(cursorX + W),
static_cast<SHORT>(cursorY + H + 2));
WriteBufferToConsole(buf, hConsole, cursorX, cursorY);
SetConsoleCursorPosition(hConsole, { 0, static_cast<SHORT>(cursorY + H) });
// Restore the original text attributes
SetConsoleTextAttribute(hConsole, originalAttributes);
return 0;
}
Edit: I forgot to say that the code for the GetWidth function can be get in https://github.com/BDisp/WcwidthWrapper, thanks.
if (IsVirtualTerminalEnabled(hConsole)) {
std::cout << "VT sequences are ENABLED.\n";
}
else {
std::cout << "VT sequences are DISABLED.\n";
}
You currently only check if VT is enabled, but you could also enable it yourself with SetConsoleMode. Is there a reason you don't do that? You could then always use VT sequences, even under the older conhost.
Is there a reason you don't simply enable VT support with
SetConsoleModeand then always use VT sequences, even under the older conhost?
As I said before my intention isn't forcing VT support but handling with a current console configuration. Windows Terminal has always VT support activated and so with no problem to deal with VT sequences. It's more for the cases when we want to use without VT support or with remote terminal via SSH, low console resources, etc. I think it isn't a big problem leaving some legacy API work without VT support. The code above is working great without VT support. I recognize that not justify to waste time with WriteConsoleOutputW, ReadConsoleOutputW, etc. But what it's working well now were good to leave them as is and leave the user to manage code to it work minimally. So, I don't have no problem working with enable VT support and I like very much. My concern wasn't handle with VT support in my code but only display if VT sequences is enabled or disabled.
I understand and won't pry any further. I'll close the issue for now in that case. Please let us know if you encounter any other issues! Also, please always feel free to use our discussions section: https://github.com/microsoft/terminal/discussions
However, I'd still like to clarify that VT sequences are always supported via SSH. More importantly though, they also require significantly less console resources, counter to what you said (both less memory and less CPU). I strongly recommend using them exclusively in any future applications you may want to create. ๐
I used work with VT support, so it isn't about it but only to see if some very old stuff still work ๐ Yes, I also agree by closing because there is no reason to break your great work for this API. Thanks.