Fix Regex Character Class Escape Tests
Character class escape tests for \d, \D, \s, \S, \w, \W are not testing what they should.
Consider the escape \d.
We should test two things:
- That it matches ALL digits (positive cases)
- That it matches NONE of the other code points (negative cases).
This is not what the auto-generated tests are currently doing.
character-class-digit-class-escape.js only checks that \d finds a match in "0123456789".
And character-class-digit-class-escape-plus-quantifier.js only checks that \d+ finds a match in "0123456789".
First, these two tests are equivalent.
But also, they do not check that \d matches all digits, only one.
And they never check that non-digits are not matched.
It seems that an implementation where \d matches only "3" and "z" would pass all the current tests.
I suggest the following:
- check positive cases by testing
^\d+$(with anchors) on the string"0123456789". - check negative cases by testing the absence of any match of
\don the string that contains all code points except digits. - do the same for each character class escape.
Thanks to Noé De Santo @Ef55 for realizing that these tests were probably not what they were intended to be, while working on the Warblre project.
cc @mathiasbynens, you might be interested in this.
CC @leobalter as the original author of these tests.
Thanks for kicking this off! Since these are generated tests, could you please update the source script that generates these files rather than editing them directly? It would be easier to review that patch.
I'll give it a try in the next few days!
I no longer have the code on any of my machines, the diff information points out to https://github.com/bocoup/test262-regexp-generator. I hope that helps.
Hi,
It seems like the tests currently in Test262 are not the same as the tests of the generator.
Currently, the tests in the generator check that, for instance, str.replace(/\d/g,'') === str.replace(/|0-9]/g,'') where str is the string of all code points.
I think these generator tests are fine: if \d matched too many things, then the string on the left side of the equality would be shorter.
And if \d didn't match enough characters, it would be longer.
So I believe these generator tests correctly check positive and negative cases! I still think that the tests for the plus quantifier are redundant, but one could argue that it's just an extra sanity check.
It seems to me like the reason for the differences between the two sets of tests is due to this commit, optimizing the tests by removing this str.replace comparison.
I see two ways forward:
- We could simply revert the optimization commit. The tests would be then agree with the generator. But probably this is too slow.
- I could modify the generator to produce the tests that I submitted in this PR. These tests do not do any long string comparison nor use
str.replace, so maybe this is fast enough?
What do you think?
I could modify the generator to produce the tests that I submitted in this PR. These tests do not do any long string comparison nor use
str.replace, so maybe this is fast enough?
IMHO this is the way to go.
Hello, I've created a pull request here: https://github.com/bocoup/test262-regexp-generator/pull/2
To generate the correct strings, I've used the regenerate.js file from this repo: https://github.com/mathiasbynens/unicode-property-escapes-tests/
I've also added a configuration with the v flag.
Best, Aurèle
We are considering copying bocoup/test262-regexp-generator into this repo. I'm not sure Bocoup is still maintaining it.
Makes sense! Let me know and I can update this PR when the generator is copied here.
I've copied and updated the generator script in https://github.com/tc39/test262/pull/4303. I had to make some modifications to it, to implement some changes that had been made directly to the test files and weren't reflected in the script. So any extra review is appreciated!
@Aurele-Barriere The generator script has landed in test262 now, so you can bring the improvements to the generator script into this PR. I'm afraid it will need a certain amount of rebasing, but at least now you can be certain that the script output matches the tests in the repo.
@Aurele-Barriere The generator script has landed in test262 now, so you can bring the improvements to the generator script into this PR. I'm afraid it will need a certain amount of rebasing, but at least now you can be certain that the script output matches the tests in the repo.
Great news thanks, I'll give it a try soon!
Hi @ptomato I ported my changes to the test generator (from https://github.com/bocoup/test262-regexp-generator/pull/2) to the one now in Test262 I didn't realize sycing my repo would close this PR automatically. So I opened a new one here https://github.com/tc39/test262/pull/4364