Regex101 icon indicating copy to clipboard operation
Regex101 copied to clipboard

C# Codegenerator: Escape character in substitution unnecessary, even wrong

Open NCC1701M opened this issue 8 months ago • 3 comments

Bug Description

When you are using the @"" to mark it as an verbatim string literal Escape characters are not required. To display a backslash you only need a single \ in the text. Also escape characters like \n or \t are not supported. The string must contain the actual non-visible characters instead.

Reproduction steps

Setup regex101 like this:

Setting Value
Flavor C#
Function Substitution
Regular Expression Hello\sWorld
Test String This is some text with a Hello World in it
Substitution Hello World\Earth

In the result view you see the correct result:

This is some text with a Hello World\Earth in it

Switch to Code Generator and you should see code like this:

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"Hello\sWorld";
        string substitution = @"Hello World\\Earth";
        string input = @"This is some text with a Hello World in it";
        RegexOptions options = RegexOptions.Multiline;
        
        Regex regex = new Regex(pattern, options);
        string result = regex.Replace(input, substitution);
    }
}

Extend the code with the following line: Console.WriteLine(result); and run it with try.dot.net

As you can see, the result will be This is some text with a Hello World\\Earth in it instead of expected outcome.

Expected Outcome

This is some text with a Hello World\Earth in it

Either use the correct characters in combination with the @"" or don't use the @ character. Or make the substitution input field a textarea field to provide the opportunity to write multiline replacements like the Test String input. Because this input will be formatted correctly.

Browser

Tested with

Browser Version
Firefox 120.0.1 (64-bit)
MS Edge 119.0.2151.97 (64-bit)
Google Chrome 119.0.6045.200 (64-bit)

OS

Windows 11 22H2

NCC1701M avatar Dec 06 '23 08:12 NCC1701M

#2186

working-name avatar Dec 06 '23 17:12 working-name

@NCC1701M: should the substitution text on regex101 be:

  1. Hello World\Earth with a single \ backslash, or
  2. Hello World\\Earth with two \\ backslashes?

When I use a single backslash, I see this in the SUBTITUTION field: image

When I use two backslashes, I see this: image

This does change the Code Generator string substitution = ... text, of course, but I think that people might worry about an error in their pattern if they see an error like this on the regex101 page.

However, it does appear that the generated code uses @"..." (or whatever delimiter had been selected) for that string substitution = @"..."; field as a string literal instead of a raw string.

I had though we added in the @"..." string literal to the Code Generator output because it fixed a previous bug where certain conditions were not being escaped correctly...

OnlineCop avatar Dec 07 '23 16:12 OnlineCop

@OnlineCop

The substitution text should be - as I described in the Expected Outcome section: This is some text with a Hello World\Earth in it with a single \.

If in the substitution text are double \ required because a single one is used as an escape character for newlines, tabs, etc. that's fine but in the generated code it has to be a single \.

As I suggested above, it might be easier if the substitution input field would support multi line text. So the user can enter Text like

This is my
multi line substitution
with a simple \ in it.

instead of This is my\nmulti line substitution\nwith a simple \\ in it.

NCC1701M avatar Dec 07 '23 16:12 NCC1701M