Squircle-CE icon indicating copy to clipboard operation
Squircle-CE copied to clipboard

Raw strings containing string terminator characters break syntax highlighting

Open tanchevk opened this issue 10 months ago • 8 comments

App Version: 2025.1.1

Affected Device(s): Samsung S24 Ultra with Android 15.0

Describe the bug In languages which allow non-escaped strings (e.g. raw strings), if the string contains an unescaped string termination character (" or '), syntax highlighting treats it as the end of the string, and any subsequent string terminators (e.g. the end of the raw string) cause syntax highlighting to break, as shown below:

Image

Image

To Reproduce Steps to reproduce the behavior:

  1. Create a new file in a language that supports raw strings
  2. In a raw string literal, insert the appropriate string terminator character for your language of choice
  3. Observe how syntax highlighting is broken on subsequent lines, up until the next unescaped string terminator

Expected behavior Raw strings should treat all characters inside of them as escaped, so as to avoid such scenarios.

tanchevk avatar May 04 '25 12:05 tanchevk

Even ChatGPT has the same issue 😄

Image

massivemadness avatar May 05 '25 16:05 massivemadness

This is due to a bit of an error on my end, the second screenshot should be as such, with a # before and after the string literal, but even then it still breaks:

Image

In your example, it'd be like so:

let a = r"string";
let b = br"string";
let c = r#""string"#;
let d = br#""string"#;

tanchevk avatar May 05 '25 17:05 tanchevk

As per the rust language reference:

Raw string literals do not process any escapes. They start with the character U+0072 (r), followed by fewer than 256 of the character U+0023 (#) and a U+0022 (double-quote) character.

The raw string body can contain any sequence of Unicode characters other than U+000D (CR). It is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character.

All Unicode characters contained in the raw string body represent themselves, the characters U+0022 (double-quote) (except when followed by at least as many U+0023 (#) characters as were used to start the raw string literal) or U+005C (\) do not have any special meaning.

tanchevk avatar May 05 '25 17:05 tanchevk

I tried the official textmate grammar for Rust (rust.tmGrammar.json) which was used in VSCode and unfortunately it has the same issue 😕

massivemadness avatar May 06 '25 07:05 massivemadness

It seems that VS Code now uses a different grammar, in this repository, which seems to not have this issue, though I can't check right now

tanchevk avatar May 06 '25 07:05 tanchevk

Yes I also tried this one, it's the same. I'm thinking it might be an android-specific bug in joni library which doesn't support all regex rules

massivemadness avatar May 06 '25 08:05 massivemadness

Hmm. It might be worth looking into different regex engines, but that seems complicated...

tanchevk avatar May 06 '25 09:05 tanchevk

does joni support begin/end/while backreferences? TextMate allows end/while to reference capture groups in the begin regex via \1

RedCMD avatar Sep 06 '25 05:09 RedCMD