test262 icon indicating copy to clipboard operation
test262 copied to clipboard

[generation] Support comments directly following regular expression litterals

Open guijemont opened this issue 2 years ago • 5 comments

The generation tool does not properly substitute in cases such as:

/foo//*{ bar }*/

Though something like this works:

/foo/ /*{ bar }*/

It seems that find_comments() needs to learn about regexp literals and handle them properly.

See https://github.com/tc39/test262/pull/3807#issuecomment-1492571532

guijemont avatar Mar 31 '23 20:03 guijemont

FWIW this branch contains my (broken) attempt at fixing this, unfortunately I don't have more time to spend on this right now.

guijemont avatar Apr 01 '23 14:04 guijemont

I'm fine with fixing /foo//*{ bar }*/, but don't want to support syntactically invalid constructs like /(?/*{ foo }*/-/*{ bar }*/:a)/. I'd rather introduce a new style of procedural test generation with corresponding frontmatter key, even if that style remains inverted (in which templates are referenced by cases) like e.g.

/*---
esid: …
…
placeholder-prefix: ' %%'
placeholder-suffix: '%% '
---*/

/(? %%subpattern-add-modifiers%% - %%subpattern-remove-modifiers%% :a)/ %%global-modifiers%% ;

for accepting existing .case files like

//- subpattern-add-modifiers
i
//- subpattern-remove-modifiers
//- global-modifiers
gu

to produce content like /(?i-:a)/gu;.

gibson042 avatar Apr 03 '23 17:04 gibson042

I'm not sure I understand the difference - isn't /(? %%foo%% - %%bar%% :a)/ just as syntactically invalid as /(?/*{ foo }*/-/*{ bar }*/:a)/?

ptomato avatar Apr 04 '23 00:04 ptomato

It's invalid at a different layer of abstraction... /(?/*{ foo }*/-/*{ bar }*/:a)/ interferes with parsing of |RegularExpressionLiteral| in the lexical grammar because the first / in /*{ foo }*/ closes the literal, while /(? %%foo%% - %%bar%% :a)/ would be recognized as a single regular expression literal and wouldn't be rejected until application of the RegExp grammar (see Regular Expression Literals: Static Semantics: Early Errors). And the distinction is relevant because the test262 procedural test generation uses a degenerate tokenizer that is ignorant of the ECMAScript syntactic grammar.

gibson042 avatar Apr 10 '23 17:04 gibson042

After poking at this bug, I think I understand better why you are asking for this. It's simple enough to support /foo//*{ bar }*/ but without extra work to support placeholder comments inside RegExp literals, we actually lose the existing support for /(?/*{ foo }*/-/*{ bar }*/:a)/.

Doing this extra work to support placeholder comments inside RegExp literals, however, means that we get all sorts of other arbitrary cases where the /*{ }*/ syntax is never going to work. For example, /foo/*{ bar }*/baz/g would perform a substitution, but //*{ bar }*//g would not, and I don't see a foolproof way to distinguish //*{ bar }*//g from a regular if weird-looking single-line comment without knowing the template writer's intention.

So, in the end I agree that we should use some other tokens to delimit placeholders inside RegExp literals.

ptomato avatar Apr 14 '23 20:04 ptomato