Raw strings containing string terminator characters break syntax highlighting
App Version: 2025.1.1
Affected Device(s): Samsung S24 Ultra with Android 15.0
Describe the bug In languages which allow non-escaped strings (e.g. raw strings), if the string contains an unescaped string termination character (" or '), syntax highlighting treats it as the end of the string, and any subsequent string terminators (e.g. the end of the raw string) cause syntax highlighting to break, as shown below:
To Reproduce Steps to reproduce the behavior:
- Create a new file in a language that supports raw strings
- In a raw string literal, insert the appropriate string terminator character for your language of choice
- Observe how syntax highlighting is broken on subsequent lines, up until the next unescaped string terminator
Expected behavior Raw strings should treat all characters inside of them as escaped, so as to avoid such scenarios.
Even ChatGPT has the same issue 😄
This is due to a bit of an error on my end, the second screenshot should be as such, with a # before and after the string literal, but even then it still breaks:
In your example, it'd be like so:
let a = r"string";
let b = br"string";
let c = r#""string"#;
let d = br#""string"#;
As per the rust language reference:
Raw string literals do not process any escapes. They start with the character U+0072 (r), followed by fewer than 256 of the character U+0023 (#) and a U+0022 (double-quote) character.
The raw string body can contain any sequence of Unicode characters other than U+000D (CR). It is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character.
All Unicode characters contained in the raw string body represent themselves, the characters U+0022 (double-quote) (except when followed by at least as many U+0023 (#) characters as were used to start the raw string literal) or U+005C (\) do not have any special meaning.
I tried the official textmate grammar for Rust (rust.tmGrammar.json) which was used in VSCode and unfortunately it has the same issue 😕
It seems that VS Code now uses a different grammar, in this repository, which seems to not have this issue, though I can't check right now
Yes I also tried this one, it's the same. I'm thinking it might be an android-specific bug in joni library which doesn't support all regex rules
Hmm. It might be worth looking into different regex engines, but that seems complicated...
does joni support begin/end/while backreferences?
TextMate allows end/while to reference capture groups in the begin regex via \1