component-model icon indicating copy to clipboard operation
component-model copied to clipboard

Should WIT keywords be escaped in the Canonical ABI?

Open Liamolucko opened this issue 3 years ago • 4 comments

Currently, the Canonical ABI is defined to simply insert function names as-is:

https://github.com/WebAssembly/component-model/blob/4d412d98372222979a7d5531cbb42c08961f596d/design/mvp/canonical-abi/definitions.py#L1109-L1112

It's the same for field names, param names, etc.

This means that functions which happen to have the same name as WIT keywords won't be escaped, and will be inserted directly.

Should they be %-escaped, to align with WIT? Or is it fine for the Canonical ABI to be different here?

Liamolucko avatar Aug 15 '22 01:08 Liamolucko

That's a good point. I wonder if it's feasible to modify the wit grammar such that name: is always parsed as an identifier and never as a keyword. That would eliminate much of the need for % escaping, and should avoid this issue as well.

For example, in func: func(func: i32) the first and third "func"s would be identifiers, because they're followed by colons.

sunfishcode avatar Aug 15 '22 14:08 sunfishcode

That's a really good question. I like Dan's idea because adding escaping to CABI means either baking in a list of wit keywords, which may grow over time, thereby backwards-incompatibly changing the CABI over time, or always %-escaping (analogous to how WAT always $-prefixes), which, since keywords aren't relevant in a CABI context, seems wholly unnecessary. Also I can imagine other places where we'll want to embed wit type syntax in the future and each would run into the same issue and so this escaping issue wouldn't just be limited to the CABI.

Thinking about how this looks in the WIT grammar, I think this changes our definition of token to remove the separate case for keywords and to instead say that keywords parse as identifiers and then we require one token of lookahead when we see an identifier for a top-level item to see if the next token is : (otherwise it'll match one of the "keyword" productions). In all the other (type syntax) contexts in which an identifier occurs , I think there are no keywords possible, so no lookahead is necessary. I assume this sort of reasoning is fraught with peril in general, but in the context of WIT it seems fine (but are those famous last words)? Additionally, I think this lets us simply remove the % escape syntax.

Other thoughts?

lukewagner avatar Aug 16 '22 20:08 lukewagner

func: func(func: func) being accepted by the grammer is surprising, but hopefully syntax highlighting will help if this ever comes up in practice.

sunfishcode avatar Aug 17 '22 22:08 sunfishcode

Your example made me realize I was probably over-optimistic before: independently of the CABI, %-escaping probably needs to stay in WIT, even if we make the above change: assuming we allow arbitrary type alias names in WIT (which, since type aliases end up producing programmer-visible names that appear as export/import names in a component, seems necessary), we'll need %-escapes to disambiguate between type alias names and built-in value type names. The CABI wouldn't ever need to use %-escapes, though, since the CABI will not be introducing or using type aliases.

lukewagner avatar Aug 19 '22 15:08 lukewagner