terminal icon indicating copy to clipboard operation
terminal copied to clipboard

ReadConsoleOutputCharacterW behavior change in 1.22

Open bradking opened this issue 7 months ago • 5 comments
trafficstars

Windows Terminal version

1.22.10731.0

Windows build number

10.0.26100.3476

Other Software

Here is sample code that writes a line of text to the console using WriteConsoleW, reads the line back using ReadConsoleOutputCharacterW, and then compares the content.

example.cxx (click to expand)
#include <cstring>
#include <iomanip>
#include <iostream>
#include <vector>
#include <wchar.h>
#include <windows.h>

int main()
{
  std::wstring const text =
    L"\u092F\u0942\u0928\u093F\u0915\u094B\u0921 " // Hindi
    L"\u03B5\u03AF\u03BD \u03B1\u03B9 "            // Greek
    L"\u0437\u0434\u043E\u0440\u043E\u0432\u043E!" // Russian
    ;

  // Write a line of text to the console.
  HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
  WriteConsoleW(hOut, text.data(), text.size(), nullptr, nullptr);
  WriteConsoleW(hOut, L"\n", 1, nullptr, nullptr);

  // Read the line of text back from the console.
  std::vector<wchar_t> received;
  {
    CONSOLE_SCREEN_BUFFER_INFO screenBufferInfo;
    if (!GetConsoleScreenBufferInfo(hOut, &screenBufferInfo)) {
      std::cerr << "GetConsoleScreenBufferInfo failed\n";
      return 1;
    }
    DWORD width = screenBufferInfo.dwSize.X;
    received.resize(width);
    COORD coord{ 0, screenBufferInfo.dwCursorPosition.Y - 1 };
    DWORD charsRead = 0;
    if (!ReadConsoleOutputCharacterW(hOut, received.data(), width, coord,
                                     &charsRead) ||
        charsRead == 0) {
      std::cerr << "ReadConsoleOutputCharacterW failed\n";
      return 1;
    }
  }

  // Compare the line we read to the line we wrote.
  if (std::memcmp(received.data(), text.data(),
                  text.size() * sizeof(wchar_t)) == 0) {
    std::cerr << "Console has expected content" << std::endl;
  } else {
    std::cerr << "Expected output | Received output" << std::endl;
    for (size_t i = 0; i < text.size(); i++) {
      std::cerr << std::setbase(16) << std::setfill('0') << "     "
                << "0x" << std::setw(8) << static_cast<unsigned int>(text[i])
                << " | "
                << "0x" << std::setw(8)
                << static_cast<unsigned int>(received[i]);
      if (static_cast<unsigned int>(text[i]) !=
          static_cast<unsigned int>(received[i])) {
        std::cerr << "   MISMATCH!";
      }
      std::cerr << std::endl;
    }
    std::cerr << std::endl;
    return 1;
  }

  return 0;
}

Steps to reproduce

Compile the above example.cxx sample code and run it in a Windows Terminal.

>cl -EHsc example.cxx
>example

Expected Behavior

ReadConsoleOutputCharacterW recovers what WriteConsoleW wrote, as it did in Windows Terminal 1.21 and always has in Windows Console Host:

>example
यूनिकोड είν αι здорово!
Console has expected content

Actual Behavior

ReadConsoleOutputCharacterW receives text partially replaced by 0xFFFD replacement characters.

>example
यूनिकोड είν αι здорово!
Expected output | Received output
     0x0000092f | 0x0000fffd   MISMATCH!
     0x00000942 | 0x0000fffd   MISMATCH!
     0x00000928 | 0x0000fffd   MISMATCH!
     0x0000093f | 0x00000921   MISMATCH!
     0x00000915 | 0x00000020   MISMATCH!
     0x0000094b | 0x000003b5   MISMATCH!
     0x00000921 | 0x000003af   MISMATCH!
     0x00000020 | 0x000003bd   MISMATCH!
     0x000003b5 | 0x00000020   MISMATCH!
     0x000003af | 0x000003b1   MISMATCH!
     0x000003bd | 0x000003b9   MISMATCH!
     0x00000020 | 0x00000020
     ...

bradking avatar Mar 31 '25 15:03 bradking

I also built Windows Terminal from source and ran git bisect. The behavior change was introduced by #16916.

bradking avatar Mar 31 '25 15:03 bradking

Thanks so much for the comprehensive repro.

This is one of those thorny issues where we're trying to move the platform forward that comes at the cost of some backwards compatibility.

With the release of 1.22 and the switch to using grapheme clusters by default, combining characters (or grapheme bases which require additional characters, or... (there's a lot of cases here)) like U+93F and U+942 and U+94B can no longer be inserted into individual cells (or CHAR_INFO) during streaming text output.

This is one of the ways in which the Windows Console APIs were never sufficient for use with languages other than those which use the Latin alphabet and some limited CJK.

If you have an application that requires strict compatibility with the original one-narrow-character-per-cell measurements offered by the console, you can configure the measurement mode Terminal uses in the Compatibility settings.

Image

DHowett avatar Apr 02 '25 22:04 DHowett

you can configure the measurement mode Terminal uses in the Compatibility settings.

Is there some capability that applications can use to detect this (e.g., to have an accurate wcwidth implementation)?

mathstuf avatar Apr 03 '25 13:04 mathstuf

Yes, absolutely! If you set console mode ENABLE_VIRTUAL_TERMINAL_PROCESSING and emit a request for DEC private mode 2027 (DECRQM 2027 "grapheme cluster support"):

\e [ ? 2 0 2 7 $ p

you will get a VT-encoded response (DECRPM) indicating whether it is permanently set / enabled (3) or permanently reset / disabled (4).

The full exchange looks something like this:

    TERMINAL || APPLICATION
             <- \e[?2027$p
\e[?2027;3$y ->

If you get a reply indicating 4 or you do not get a reply, the console is in traditional/Windows measurement mode. If you get a reply indicating 3, the console is in grapheme cluster measurement mode.

The "Unix/wcswidth" measurement mode is somewhat of an outlier here, and I don't have a good answer for how to detect it. It's not the default or the backwards-compatible option so we expect users to only use it when they have an explicit need.

We don't have another both extensible and backwards-compatible mechanism for signaling console state, so right now VT is the best we can offer. Sorry about that.

DHowett avatar Apr 03 '25 15:04 DHowett

@DHowett thanks for the explanation! I mainly wanted to make sure this was not unintended, and if the change is intentional then I'm fine with closing this.

In my real use case, our application only writes to the console and so is not affected by this change. Our use of ReadConsoleOutputCharacterW is only in a test case that verifies that we wrote to the console correctly. Is there some other way we can read back from the console now?

bradking avatar Apr 03 '25 17:04 bradking

I've updated our test suite to avoid reading back from the console, so the behavior change in ReadConsoleOutputCharacterW no longer affects us.

Since the change was known and intentional, I'll close this issue.

bradking avatar May 14 '25 14:05 bradking

Sorry about the lack of response or better options here - it has been a much busier April and May than we expected. I think it's the right call to avoid reading the console back in test. We have tests that do that, but they're expressly tests of the console subsystem. 🙂

DHowett avatar May 14 '25 15:05 DHowett