[generation] Support comments directly following regular expression litterals
The generation tool does not properly substitute in cases such as:
/foo//*{ bar }*/
Though something like this works:
/foo/ /*{ bar }*/
It seems that find_comments() needs to learn about regexp literals and handle them properly.
See https://github.com/tc39/test262/pull/3807#issuecomment-1492571532
FWIW this branch contains my (broken) attempt at fixing this, unfortunately I don't have more time to spend on this right now.
I'm fine with fixing /foo//*{ bar }*/, but don't want to support syntactically invalid constructs like /(?/*{ foo }*/-/*{ bar }*/:a)/. I'd rather introduce a new style of procedural test generation with corresponding frontmatter key, even if that style remains inverted (in which templates are referenced by cases) like e.g.
/*---
esid: …
…
placeholder-prefix: ' %%'
placeholder-suffix: '%% '
---*/
/(? %%subpattern-add-modifiers%% - %%subpattern-remove-modifiers%% :a)/ %%global-modifiers%% ;
for accepting existing .case files like
//- subpattern-add-modifiers
i
//- subpattern-remove-modifiers
//- global-modifiers
gu
to produce content like /(?i-:a)/gu;.
I'm not sure I understand the difference - isn't /(? %%foo%% - %%bar%% :a)/ just as syntactically invalid as /(?/*{ foo }*/-/*{ bar }*/:a)/?
It's invalid at a different layer of abstraction... /(?/*{ foo }*/-/*{ bar }*/:a)/ interferes with parsing of |RegularExpressionLiteral| in the lexical grammar because the first / in /*{ foo }*/ closes the literal, while /(? %%foo%% - %%bar%% :a)/ would be recognized as a single regular expression literal and wouldn't be rejected until application of the RegExp grammar (see Regular Expression Literals: Static Semantics: Early Errors). And the distinction is relevant because the test262 procedural test generation uses a degenerate tokenizer that is ignorant of the ECMAScript syntactic grammar.
After poking at this bug, I think I understand better why you are asking for this. It's simple enough to support /foo//*{ bar }*/ but without extra work to support placeholder comments inside RegExp literals, we actually lose the existing support for /(?/*{ foo }*/-/*{ bar }*/:a)/.
Doing this extra work to support placeholder comments inside RegExp literals, however, means that we get all sorts of other arbitrary cases where the /*{ }*/ syntax is never going to work. For example, /foo/*{ bar }*/baz/g would perform a substitution, but //*{ bar }*//g would not, and I don't see a foolproof way to distinguish //*{ bar }*//g from a regular if weird-looking single-line comment without knowing the template writer's intention.
So, in the end I agree that we should use some other tokens to delimit placeholders inside RegExp literals.