`<regex>`: `R"([\d-e])"` should be rejected
Describe the bug
The regex [\d-e] (character class containing the range \d to e) is accepted (treated as \d, and the literal characters - and e), contrary to the ECMA-262 spec (\d isn't a single character, so it can't be used like that)
Command-line test case
#include <regex>
int main()
{
try {
std::regex r("[\\d-e]");
puts("it's legal");
} catch (std::exception& e) {
puts(e.what());
}
try {
std::regex r("[b-a]");
puts("it's legal");
} catch (std::exception& e) {
puts(e.what());
}
}
https://godbolt.org/z/oMvEr5YTs
Expected behavior
Both should be illegal (currently, only the latter is rejected)
STL version
Ask Godbolt
Additional context
Feel free to close this one as wontfix, if you feel it's ossified into a vendor extension. As long as it's a conscious choice, I'm fine with whichever outcome.
We talked about this at the weekly maintainer meeting and we believe that this is clearly a bug, as we should be following what ECMAScript specifies here. (Technically the C++ Standard cites ECMAScript 3, but modern versions are written in a clearer way - we can refer to them as long as we don't accidentally pick up new features.)
As C++ Standard Library implementations have wildly varying behavior, @barcharcraz suggests checking what Chromium and Firefox do.
Careful about that - browsers have plenty of regex extensions that aren't part of ESv3 either.
And even if the C++ spec is updated to cite a newer ES version, the spec explicitly calls out that they're backwards compat extensions that non-browsers should omit.