.NET (C#) flavor: SUBSTITUTION field should use verbatim strings, just like the REGULAR EXPRESSION field
Bug Description
#1734 was closed, but I don't think it was correctly resolved.
Reproduction steps
https://regex101.com/r/WYJ3DJ/1
Expected Outcome
a[!]\a"z
That is, the SUBSTITUTION field value, a[$&]\a""z, should be treated as if it were enclosed in @"...", a C# verbatim string - just like the REGULAR EXPRESSION field, which means:
-
\has no special meaning and must be retained verbatim - verbatim
"must be escaped as""(otherwise, the string is invalid)
Browser
Brave Browser 106.1.44.101
OS
macOS 12.6
On second thought:
-
There is a value in treating the SUBSTITUTION field value as a regular
"..."C# string, so as to enable use of\nin order to produce newlines in the result, for instance. -
While the latter currently works,
\"in the SUBSTITUTION field value does NOT (while unescaped"does), even though that's it what takes to embed verbatim"inside a C#"..."string.
Thus:
-
If only one syntax can be supported, consistent behavior is desirable. And the syntax in effect should ideally be reflected in the UI, just like REGULAR EXPRESSION FIELD shows the implied delimiters, such as
@" ... " -
Ideally, the C# flavor would allow selecting whether
@"..."or regular"..."strings are used for the SUBSTITUTION field.
Conceptually related, with respect to quoting styles:
- #1838
I think it would make sense to treat the substitution string as a "normal" string, to allow for shorthands such as \a and \t to be used. This is implicitly handled by the regex engine, regardless of quote style, but not for substitutions: https://github.com/dotnet/runtime/blob/33be69a86167c11064af9c4704c5a7f88a50c75a/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs#L1677
Would that make sense?
Yes, as you state it is C#'s string handling, not the regex engine that handles escape sequences such as \n and \t in the substitution string.
I agree that there is value in supporting these, but ideally there'd be a choice of whether to use implied "..." or implied @"...".
The minimal action would be to make it clear in the UI that the substitution string is implicitly "..." (analogous to how the regex field already makes it clear that @"..." is implied), so that users aren't baffled that a verbatim " must be escaped as "" in the @"..." string that defines the regex, whereas it should be \" in the substitution string - if it is conceived as implicitly "..."-enclosed - but - unexpectedly - currently requires unescaped use (using \" actually reports a pattern error - this is really a separate problem).
You are right that the long term solution would likely be to introduce the ability to select in the UI, but until I get time to work on that solution, we need something short term.
I would suggest double quotes for substitution strings, and verbatim for the rest. Thoughts?
That makes sense.
If you have the time, I think having the UI signal that "..." is (for now invariably) used in the substitution string would be helpful.
You are right that the long term solution would likely be to introduce the ability to select in the UI, but until I get time to work on that solution, we need something short term.
I would suggest double quotes for substitution strings, and verbatim for the rest. Thoughts?
Hi,I think the best solution is to provide the following two input forms:
-
literal text. What user enters in input box is what the engine finally receives. For example, when you type
\n, the regex engine will accept two characters\andn. This is the way RegexBuddy does.
Note that this is not equivalent to verbatim string (raw stirng), because verbatim string still have to consider escaping the delimiters (quotation marks). -
normal strings. The user input will be parsed as a normal string, and regex engine will receive the characters that the normal string actually represents. For example, when you type
\n, the regex engine will receive one character newline.
It is better to apply to both the regex input box and the replacement input box, as well as to all flavor.
Default option should just definitely just be pure literal text, no escaping at all.
The escaping is only a feature of a programming language, and only when writing string literals in that language. In this case C#. But the actually expressed value (the regex pattern) is a feature of .NET, to be parsed by the regex engine of .NET.
In fact, a regex pattern may be read from any input, like a configuration file or database table. A pattern in a database will probably not be escaped. And a pattern as written in a configuration file can any kind of escaped value according to the format of that file (JSON uses the very standard backslash notation for example).
Also note that at least Visual Studio automatically adds the escaping for you when you copy-paste text into a string literal between quotes. And there are probably more IDEs doing the same thing these days.
Last notice, the frequency I am using verbatim strings has been rapidly decreasing since the new raw string literals (also mentioned above by Serious54088) came out with C# 11.
Verbatim strings are becoming somewhat obsolete now.