ecma262 Editorial: Generalize ParseText to support String input

ParseText(StringToCodePoints(_str_), |Nonterminal|), often introducing a new alias just for the output of StringToCodePoints, is a common pattern necessitated by ParseText expecting its first argument to be a sequence of code points in operations that have a String. I think it makes sense to simplify those call sites by allowing ParseText to directly accept the String and perform conversion itself.

Jun 14 '23 17:06 gibson042

Nice improvement. While we're talking ParseText, should this PR also reify that a ParseText(_x_, |Foo|) is a Parse Node can be shortened to _x_ is a |Foo|? This text is already used in a few places in 262, for example:

String that when processed is a |FunctionBody|
Assert: _lhs_ is a |LeftHandSideExpression|.
Assert: _d_ is a |FunctionDeclaration|.

But it might be nice to make it official?

Jun 15 '23 05:06 justingrant

Hmm, I think that would need more discussion. In the text that you cite from Strict Mode Code and ForIn/OfBodyEvaluation and Changes to BlockDeclarationInstantiation, the values that are being compared against |FunctionBody| or |LeftHandSideExpression| or |FunctionDeclaration| are not Strings or even sequences of code points, but rather Parse Nodes for which such comparison is defined by The Syntactic Grammar:

Each Parse Node is an instance of a symbol in the grammar; it represents a span of the source text that can be derived from that symbol.

If I understand correctly, what you're suggesting would be a new shorthand for checking whether a sequence of code points (possibly derived by interpreting a String as WTF-16 per The String Type) can be successfully parsed with a specific goal symbol, but in a way that doesn't return the corresponding Parse Node. That doesn't seem in scope for this PR, but I'm also skeptical that it makes sense to include anyway—generally speaking, we should want the resulting Parse Node so it can be used in a subsequent step (as seems to be the case for every current call site except IsTimeZoneOffsetString, and I think that deviation should be addressed by folding it into ParseTimeZoneOffsetString—see also IsValidRegularExpressionLiteral and Parse, don’t validate).

Jun 15 '23 06:06 gibson042

we should want the resulting Parse Node so it can be used in a subsequent step

Agree that this is and should be true for algorithm steps. But for assertions and editorial prose, the goal isn't to actually parse something. For those cases, Assert: _x_ is a |Foo|. seems clearer than Assert: ParseText(_x_, |Foo|) returns a Parse Node. or Assert: ParseText(_x_, |Foo|) does not return a list of errors.

Regardless, out of scope here so I'll be quiet now. :-)

Jun 15 '23 16:06 justingrant

An alternative would be to leave ParseText as is, and introduce (say) ParseString to encapsulate ParseText(StringToCodePoints(_string_), _nonterminal_).

Oct 28 '23 18:10 jmdyck

ecma262 ecma262 copied to clipboard

Editorial: Generalize ParseText to support String input

ecma262
ecma262 copied to clipboard