STL icon indicating copy to clipboard operation
STL copied to clipboard

`<regex>`: `R"([\d-e])"` should be rejected

Open Alcaro opened this issue 1 year ago • 2 comments

Describe the bug

The regex [\d-e] (character class containing the range \d to e) is accepted (treated as \d, and the literal characters - and e), contrary to the ECMA-262 spec (\d isn't a single character, so it can't be used like that)

Command-line test case

#include <regex>

int main()
{
    try {
        std::regex r("[\\d-e]");
        puts("it's legal");
    } catch (std::exception& e) {
        puts(e.what());
    }
    try {
        std::regex r("[b-a]");
        puts("it's legal");
    } catch (std::exception& e) {
        puts(e.what());
    }
}

https://godbolt.org/z/oMvEr5YTs

Expected behavior

Both should be illegal (currently, only the latter is rejected)

STL version

Ask Godbolt

Additional context

Feel free to close this one as wontfix, if you feel it's ossified into a vendor extension. As long as it's a conscious choice, I'm fine with whichever outcome.

Alcaro avatar Sep 30 '24 14:09 Alcaro

We talked about this at the weekly maintainer meeting and we believe that this is clearly a bug, as we should be following what ECMAScript specifies here. (Technically the C++ Standard cites ECMAScript 3, but modern versions are written in a clearer way - we can refer to them as long as we don't accidentally pick up new features.)

As C++ Standard Library implementations have wildly varying behavior, @barcharcraz suggests checking what Chromium and Firefox do.

StephanTLavavej avatar Oct 02 '24 21:10 StephanTLavavej

Careful about that - browsers have plenty of regex extensions that aren't part of ESv3 either.

And even if the C++ spec is updated to cite a newer ES version, the spec explicitly calls out that they're backwards compat extensions that non-browsers should omit.

Alcaro avatar Oct 02 '24 21:10 Alcaro