roslyn icon indicating copy to clipboard operation
roslyn copied to clipboard

Code completion for mixed whitespace characters in raw string

Open ufcpp opened this issue 3 years ago • 6 comments

Version Used:

Visual Studio Version 17.4.0 Preview 2.0 C# Tools 4.4.0-2.22430.14+2f760738cb92f32f50c981b68ba04ac3c8b7ee48

Steps to Reproduce:

_ = """
             All whitespace characters are the same as the closing line.
             U+20, U+2000, U+2001, U+2002, U+2003, U+2004, U+2005, U+2006, U+2007, U+2008, U+2009, U+200A, U+3000
             Please insert a new line here:
             """;

https://sharplab.io/#v2:CYLg1APg+gBAvDARMgsAKBoAAJCABIIAJDABICAEgoASBgBIOAEgEASCQBIFAEgAAwCCANszAO4AWAlgC4CmAZwAOAQwDG/GOM6iAThIFzBMeVN6cpg0QFspolRqnjmAe0HcAdgHMYzK/wB06LHiJkqdegFUwAJgAGABoYX0CA4ND/CIBGELCIv3jogIBmZPCAFgyIgFYcgIA2AoB2AoAOAoBOAsZk1IiAlxwCEgoaBgAFZn4DKStBfjleVRhLfnY7BxhNOX4QZrc2zwZkRABudCA===

Expected Behavior:

_ = """
             All whitespace characters are the same as the closing line.
             U+20, U+2000, U+2001, U+2002, U+2003, U+2004, U+2005, U+2006, U+2007, U+2008, U+2009, U+200A, U+3000
             Please insert a new line here:
             Whitespaces are copied from the closing line.
             """;

https://sharplab.io/#v2:CYLg1APg+gBAvDARMgsAKBoAAJCABIIAJDABICAEgoASBgBIOAEgEASCQBIFAEgAAwCCANszAO4AWAlgC4CmAZwAOAQwDG/GOM6iAThIFzBMeVN6cpg0QFspolRqnjmAe0HcAdgHMYzK/wB06LHiJkqdegFUwAJgAGABoYX0CA4ND/CIBGELCIv3jogIBmZPCAFgyIgFYcgIA2AoB2AoAOAoBOAsZk1IiAlxwCEgoaBgAFZn4DKStBfjleVRhLfnY7BxhNOX4QZrc2zwYAdR4BEQkhVTnpU2FufmAYADM5Ux0YI2kzCxsp8ecMFvd2r2REAG4gA=

Actual Behavior:

_ = """
             All whitespace characters are the same as the closing line.
             U+20, U+2000, U+2001, U+2002, U+2003, U+2004, U+2005, U+2006, U+2007, U+2008, U+2009, U+200A, U+3000
             Please insert a new line here:
             Visual Studio IDE inserts ASCII spaces (U+0020), which causes CS9003 Error.
             """;

https://sharplab.io/#v2:CYLg1APg+gBAvDARMgsAKBoAAJCABIIAJDABICAEgoASBgBIOAEgEASCQBIFAEgAAwCCANszAO4AWAlgC4CmAZwAOAQwDG/GOM6iAThIFzBMeVN6cpg0QFspolRqnjmAe0HcAdgHMYzK/wB06LHiJkqdegFUwAJgAGABoYX0CA4ND/CIBGELCIv3jogIBmZPCAFgyIgFYcgIA2AoB2AoAOAoBOAsZk1IiAlxwCEgoaBgAFZn4DKStBfjleVRhLfnY7BxhNOX4QFxgl5ZWlgDVuQQBXUTYAZV4t4G5TGABJABEAURgBod4VRj2AYTOzmBEJIRgACl9EgIAShCXG4Mmkoi2gxUzz2VQiqRgVzkclMcmcGBa7naXmQiAA3EA===

As a side note, "Convert to raw string" code fix can handle mixed whitespaces correctly:

image

ufcpp avatar Sep 23 '22 02:09 ufcpp

Could you link to a file that contains the code in question. the above explanation is a bit confusing. Thanks!

CyrusNajmabadi avatar Sep 23 '22 03:09 CyrusNajmabadi

Updated to link sharplab codes.

FYI, you can reproduce the issue by using only ASCII spaces (20) and tabs (09).

ufcpp avatar Sep 23 '22 03:09 ufcpp

Could you just make an actual file. I really do not trust having to go into sharplab to determine waht actual characters are at an actual position. A file means there is no confusion at all about the individual bytes in the file. Thanks!

CyrusNajmabadi avatar Sep 23 '22 03:09 CyrusNajmabadi

Can you download from https://gist.github.com/ufcpp/0366469fe355705fdfb062ce98570586 ?

ufcpp avatar Sep 23 '22 03:09 ufcpp

Maybe copy-and-paste from the folloing code shows 20 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 200A 3000

foreach (var c in "             ")
{
    Console.WriteLine($"{(int)c:X}");
}

ufcpp avatar Sep 23 '22 03:09 ufcpp

Thanks. That's probably more than sufficient. I can look on Tuesday!

CyrusNajmabadi avatar Sep 23 '22 03:09 CyrusNajmabadi

On a separate note, I've found another strange behavior in Visual Studio while experimenting with this issue. Visual Studio seems to display VT (U+000B) and FF (U+000C) as ♂ and ♀ respectively.

https://github.com/ufcpp/UfcppSample/blob/master/Demo/2022/Csharp11/17.2p1/RawStringLiteral/Whitespaces.cs

image

ufcpp avatar Sep 23 '22 14:09 ufcpp

@ufcpp That sounds like the file was interpreted as CP437.

svick avatar Sep 23 '22 17:09 svick

@ufcpp Please file that through normal vs feedback. Thanks!

CyrusNajmabadi avatar Sep 23 '22 18:09 CyrusNajmabadi

It doesn't seem to be CP437.

image

I just heard on Twitter that this glyph is used when displaying control characters in Wingdings.

Please file that through normal vs feedback. Thanks!

I'll do it.

ufcpp avatar Sep 24 '22 01:09 ufcpp

https://developercommunity.visualstudio.com/t/Visual-Studio-IDE-displays-ASCII-control/10156578 done.

ufcpp avatar Sep 24 '22 02:09 ufcpp

@allisonchou The issue here is that this is the "indentation service" portion of VS. Specifically, it operates by querying us for the column the user caret should be placed at. However, for raw-strings, this really isn't teh concept we want. For example, with the code this user has it's not spaces/tabs that make up the indentation, but rather specialized whitespace characters.

In order to keep things functioning properly here, we should not do this processing through the indentation system, but have a specialized command handler (like RawStringLiteralCommandHandler) which intercepts this and places the exact right whitespace here before the caret.

That said, this is likely low priority. It will only affect people who happen to indent not using common indentation strategies (spaces/tabs). So it likely can be on the backlog unless we hear about this affecting more people.

CyrusNajmabadi avatar Sep 30 '22 18:09 CyrusNajmabadi