STL icon indicating copy to clipboard operation
STL copied to clipboard

<regex>: Goofy message for error_badbrace

Open Alcaro opened this issue 1 year ago • 7 comments

Describe the bug

The error_badbrace string is significantly goofy looking

Command-line test case

#include <regex>

int main()
{
    try {
        std::regex r("a{4,3}");
        puts("it's legal");
    } catch (std::exception& e) {
        puts(e.what());
    }
}

https://godbolt.org/z/Y14Exavx7

Expected behavior

The expression contained an invalid range in a {} expression.

Actual behavior

The expression contained an invalid range in a { expression }.

Why was the closing brace moved one word to the right? That's neither consistent with the C++ spec, nor with my understanding of English grammar.

STL version

Don't know, ask Godbolt

Additional context

Probably a https://github.com/microsoft/STL/labels/good%20first%20issue

Alcaro avatar Sep 30 '24 13:09 Alcaro

Yes, I agree. Code: https://github.com/microsoft/STL/blob/faccf0084ed9b8b58df103358174537233b178c7/stl/inc/regex#L500-L502

Why was the closing brace moved one word to the right?

Probably a copy-paste error. Look at the messages above: https://github.com/microsoft/STL/blob/faccf0084ed9b8b58df103358174537233b178c7/stl/inc/regex#L491-L499

https://godbolt.org/z/rKrbha61c https://godbolt.org/z/bMdzrEhnv https://godbolt.org/z/c8x4oTGEx

Click to expand unrelated discussion of other regex behavior:

By the way, we found a bug in the GCC standard library. I will check if that bug has been reported by someone; if not, I will create an issue. An example with the bug: https://godbolt.org/z/c8x4oTGEx

fsb4000 avatar Sep 30 '24 14:09 fsb4000

Looks like GCC accepts } as a literal character. It does the same with ] (but not )). https://godbolt.org/z/sds8jf9W6

Combine that with https://godbolt.org/z/oMvEr5YTs, and we've got accepts-invalid bugs in all three regex engines. Fun. (libc++ reported as https://github.com/llvm/llvm-project/issues/110339, MS-STL ... I think it's ossified into a vendor extension, but I might as well report it properly so we have a proper decision on it.)

Alcaro avatar Sep 30 '24 14:09 Alcaro

I've created: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116903

fsb4000 avatar Sep 30 '24 14:09 fsb4000

They answered that this is a valid regex. and Google Chrome and Firefox think the same.

alert(RegExp("a{0}}").test("}"));

Both browsers print "true".

fsb4000 avatar Oct 01 '24 00:10 fsb4000

Alfred Agrell wrote:

https://262.ecma-international.org/#prod-SyntaxCharacter says

SyntaxCharacter :: one of ^ $ \ . * + ? ( ) [ ] { } |

PatternCharacter :: SourceCharacter but not SyntaxCharacter

so I believe that regex is, in fact, invalid, and browsers accepting it is some kind of compatibility extension.

But I agree it's complicated. Does the C++ spec ever say which version of ECMA-262 those regexes should match?

and

Jonathan Wakely answered

Oh right, I always forget that browsers have compatibility features for ECMAScript regexes: https://262.ecma-international.org/#sec-additional-ecmascript-features-for-web-browsers

Those are optional for non-browsers.

The C++ standard references Ecma International, ECMAScript2 Language Specification, Standard Ecma-262, third edition, 1999

So maybe both libstdc++ and libc++/MS STL are correct, because some features are optional in regex.

fsb4000 avatar Oct 02 '24 00:10 fsb4000

We talked about this at the weekly maintainer meeting and we agree that the error message can be easily improved.

@fsb4000 As this issue is about the error_badbrace message, discussion of other potential bugs should be done in a separate issue to avoid confusion. This is a good general principle to follow, but especially in things marked https://github.com/microsoft/STL/labels/good%20first%20issue which should be easy for new contributors to read and understand. Accordingly I'm going to collapse the unrelated comments.

StephanTLavavej avatar Oct 02 '24 21:10 StephanTLavavej

Will take a look into fixing this!

FranciscoThiesen avatar Oct 03 '24 23:10 FranciscoThiesen