monaco-editor icon indicating copy to clipboard operation
monaco-editor copied to clipboard

[Bug] CPP raw string literals with quotes in delimiter breaks colorization

Open nebularnoise opened this issue 3 years ago • 1 comments

Reproducible in vscode.dev or in VS Code Desktop?

  • [X] Not reproducible in vscode.dev or VS Code Desktop

Reproducible in the monaco editor playground?

Monaco Editor Playground Code

#include <string>
#include <iostream>


int main(){
    
    auto s = R""""(
    Hello World
    )"""";

    std::cout << "hello";
    
}

Actual Behavior

Colorization is broken in the code window, after the raw string literal. This seems to be due to the used delimiter, which contains double quotes.

Expected Behavior

The following line

std::cout << "hello";

Should not be colorized fully in 'string colour'.

Additional Context

This issue was first opened on godbolt, I was told to report it upstream. https://github.com/compiler-explorer/compiler-explorer/issues/3684

Note: for the playground editor, I went here: https://microsoft.github.io/monaco-editor/index.html set the language to CPP before pasting the code, the playground seemed JS only.

Screenshot_20220518_124917

nebularnoise avatar Jun 02 '22 15:06 nebularnoise

I'm very sure this is a problem in the monarch grammar.

hediet avatar Jul 19 '22 13:07 hediet

I think this might not be possible to implement in monarch, in the general case.

The rule for finding the end of a raw string in https://github.com/microsoft/monaco-editor/blob/main/src/basic-languages/cpp/cpp.ts is /(.*)(\))(?:([^ ()\\\t"]*))(\")/, so it's only looking at what's between the ) and ". Then there's a $3==$S2 case to check if that sequence matches the one at the start of the string.

Maybe there is some fancy state machine trick that can be applied. Otherwise one temporary solution could be to hardcode tests for sequences of <=10 characters, for example.

jeremy-rifkin avatar May 25 '23 00:05 jeremy-rifkin

I thought there might be some way to do it with dot-separated sub-states but now I don't think that's the case. It doesn't appear we can write $S2 within a regex string, but that would be nice.

Maybe there's a way with goBack but I'm not thinking of it.

If only multi-line regexes worked, it could just be /@encoding?R\"(?:([^ ()\\\t]*))\(.*\)\1\"/m.


Edit the next day: I've learned the standard does specify a maximum length of 16 characters for the delimiter sequence so hard-coding would be possible just horribly ugly. I've opened a PR at microsoft/vscode to expand the functionality of monarch and hopefully that goes somewhere.

jeremy-rifkin avatar May 25 '23 00:05 jeremy-rifkin