terminal icon indicating copy to clipboard operation
terminal copied to clipboard

ICH no longer works correctly on lines containing wide characters

Open j4james opened this issue 2 years ago • 1 comments

Windows Terminal version

Commit 547349af77df16d0eed1c73ba3041c84f7b063da

Windows build number

10.0.19044.2364

Other Software

No response

Steps to reproduce

  1. Create a build from the latest commit.
  2. Start Windows Terminal and open a WSL bash shell.
  3. Execute the following command: printf "\u3053\u3093\u306B\u3061\u306F World\r\e[@\n"

Expected Behavior

This should write out the text こんにちは World, and then insert a space at the start of the line via an ICH sequence. This is what it looks like in Preview version 1.16.2641.0:

image

Actual Behavior

When using the latest commit, the Japanese characters are erased by the ICH sequence, so all that's left is World.

image

So this looks to me like a regression, although I'm not sure when exactly it stopped working.

And note that this fails in OpenConsole as well. It's just easier to confirm that it's a recent regression in Windows Terminals because I know 1.16 was working OK.

j4james avatar Jan 03 '23 21:01 j4james

As far as I can tell, the regression came from the new ROW code in a01500f051bc896465ac4bd37811142bc2347fb8.

The other thing I've noticed is that it only appears to be a problem when inserting a single space. An ICH with a parameter of 2 or more seems to work fine. I wonder if the problem is that it's trying to copy a 2-cell character on top of itself when moving one cell to the right, and in the process it destroys the source data before it can finish writing it to the target.

j4james avatar Jan 03 '23 22:01 j4james

Thanks for digging into this @j4james! Leonard is out for a couple more weeks -- is this an area you'd be comfortable poking at in advance of the 1.17 release?

DHowett avatar Jan 06 '23 21:01 DHowett

The buffer code is a bit of a black box to me, so I don't want to promise anything, but I can take a look and see if there's an easy fix. I suspect this is out of my league though.

j4james avatar Jan 06 '23 22:01 j4james

First off, a few more test cases.

  1. Deleting a character with DCH. This hits the same code path as ICH (the _ScrollRectHorizontally method in AdaptDispatch), but this shows it fails when scrolling in both directions.

     printf " \u3053\u3093\u306B\u3061\u306F World\r\e[P\n"
    
  2. Copying the screen one column to the right with DECCRA. This goes through the CopyRectangularArea method of AdaptDispatch, but it's a similar pattern of code to _ScrollRectHorizontally.

     printf "\u3053\u3093\u306B\u3061\u306F World\n\e[;;;;;;2\$v"
    
  3. Scrolling the screen one column to the right with the ScrollConsoleScreenBuffer API. This goes through conhost's _CopyRectangle function, which again is using a similar pattern of code.

    SetConsoleOutputCP(65001);
    std::cout << "こんにちは World\n";
    HANDLE h = GetStdHandle(STD_OUTPUT_HANDLE);
    SMALL_RECT scrollRect = { 0, 0, 32767, 32767 };
    CHAR_INFO fill = {L' '};
    ScrollConsoleScreenBuffer(h, &scrollRect, &scrollRect, {1,0}, &fill);
    

In all these cases, what we're doing is iterating over a range of cells in the buffer, and copying them from one location to another like this:

do
{
    const auto data = OutputCell(*textBuffer.GetCellDataAt(sourcePos));
    textBuffer.Write(OutputCellIterator({ &data, 1 }), targetPos);
    source.WalkInBounds(sourcePos, walkDirection);
} while (target.WalkInBounds(targetPos, walkDirection));

But when the source and target are only one cell apart, and the characters we're manipulating are occupying more than one cell, we can end up erasing a character before we've finished copying it.

Let's say we've got a DBCS character in columns 1 and 2, which we're copying one column to the right, so we start by writing the trailing byte to column 3. However, because this is a two-cell character, the code immediately replaces both columns 2 and 3. But this leaves column 1 in a broken state (it's now half a DBCS character), so that gets erased as well.

Now we come to copying what would have been the leading byte from column 1 to column 2, but it's column 1 is now a blank. And once we write a blank into column 2, that leaves column 3 in a broken state, so again that needs to be erased, and everything is blank. At least I that's my understanding of what's going on.

I suspect the erasure of half-DBCS characters is a new thing that was added in the ROW update, and that's possibly why we didn't have this issue before. But I don't want to change that - I'm fairly certain that is the right thing to do. We just need to update the various scrolling operations to take that into account, and I think I've got a simple way we can achieve that.

All we need to do is read two cells from the source, before writing to the target, so the code snippet from above would change to something like this:

auto next = OutputCell(*textBuffer.GetCellDataAt(sourcePos));
do
{
    const auto current = next;
    source.WalkInBounds(sourcePos, walkDirection);
    next = OutputCell(*textBuffer.GetCellDataAt(sourcePos));
    textBuffer.WriteLine(OutputCellIterator({ &current, 1 }), targetPos);
} while (target.WalkInBounds(targetPos, walkDirection));

Now I realise this is not a long term fix - this will assumedly not work once we have characters that can span more than two cells. But it does at least gets things working again without requiring massive changes to the code. And once Leonard is back he can come up with a proper solution.

j4james avatar Jan 07 '23 21:01 j4james

:tada:This issue was addressed in #14650, which has now been successfully released as Windows Terminal Preview v1.17.1023.:tada:

Handy links:

ghost avatar Jan 24 '23 18:01 ghost