<regex>: Goofy message for error_badbrace
Describe the bug
The error_badbrace string is significantly goofy looking
Command-line test case
#include <regex>
int main()
{
try {
std::regex r("a{4,3}");
puts("it's legal");
} catch (std::exception& e) {
puts(e.what());
}
}
https://godbolt.org/z/Y14Exavx7
Expected behavior
The expression contained an invalid range in a {} expression.
Actual behavior
The expression contained an invalid range in a { expression }.
Why was the closing brace moved one word to the right? That's neither consistent with the C++ spec, nor with my understanding of English grammar.
STL version
Don't know, ask Godbolt
Additional context
Probably a https://github.com/microsoft/STL/labels/good%20first%20issue
Yes, I agree. Code: https://github.com/microsoft/STL/blob/faccf0084ed9b8b58df103358174537233b178c7/stl/inc/regex#L500-L502
Why was the closing brace moved one word to the right?
Probably a copy-paste error. Look at the messages above: https://github.com/microsoft/STL/blob/faccf0084ed9b8b58df103358174537233b178c7/stl/inc/regex#L491-L499
https://godbolt.org/z/rKrbha61c https://godbolt.org/z/bMdzrEhnv https://godbolt.org/z/c8x4oTGEx
Click to expand unrelated discussion of other regex behavior:
By the way, we found a bug in the GCC standard library. I will check if that bug has been reported by someone; if not, I will create an issue. An example with the bug: https://godbolt.org/z/c8x4oTGEx
Looks like GCC accepts } as a literal character. It does the same with ] (but not )). https://godbolt.org/z/sds8jf9W6
Combine that with https://godbolt.org/z/oMvEr5YTs, and we've got accepts-invalid bugs in all three regex engines. Fun. (libc++ reported as https://github.com/llvm/llvm-project/issues/110339, MS-STL ... I think it's ossified into a vendor extension, but I might as well report it properly so we have a proper decision on it.)
I've created: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116903
They answered that this is a valid regex. and Google Chrome and Firefox think the same.
alert(RegExp("a{0}}").test("}"));
Both browsers print "true".
Alfred Agrell wrote:
https://262.ecma-international.org/#prod-SyntaxCharacter says
SyntaxCharacter :: one of ^ $ \ . * + ? ( ) [ ] { } |
PatternCharacter :: SourceCharacter but not SyntaxCharacter
so I believe that regex is, in fact, invalid, and browsers accepting it is some kind of compatibility extension.
But I agree it's complicated. Does the C++ spec ever say which version of ECMA-262 those regexes should match?
and
Jonathan Wakely answered
Oh right, I always forget that browsers have compatibility features for ECMAScript regexes: https://262.ecma-international.org/#sec-additional-ecmascript-features-for-web-browsers
Those are optional for non-browsers.
The C++ standard references Ecma International, ECMAScript2 Language Specification, Standard Ecma-262, third edition, 1999
So maybe both libstdc++ and libc++/MS STL are correct, because some features are optional in regex.
We talked about this at the weekly maintainer meeting and we agree that the error message can be easily improved.
@fsb4000 As this issue is about the error_badbrace message, discussion of other potential bugs should be done in a separate issue to avoid confusion. This is a good general principle to follow, but especially in things marked https://github.com/microsoft/STL/labels/good%20first%20issue which should be easy for new contributors to read and understand. Accordingly I'm going to collapse the unrelated comments.
Will take a look into fixing this!