Microproposal: numeric shorthands in text format
I couldn't find a proposal repo for the text format, so filing it here.
Looking at https://github.com/WebAssembly/threads/pull/50/files the fact that we end up with 'i32.const' and 'i64.const' operations everywhere tends to obscure the purpose of the code and reduce its readability.
IMO we should allow for at least the following lexing shorthands in the text format to reduce visual noise.
0_i32 equivalent to the token pair i32.const 0
0_i64 equivalent to i64.const 0
0.5_f32 equivalent to f32.const 0.5
0.5_f64 equivalent to f64.const 0.5
(I also think i, L, f, and d should be legal shorthand suffixes but I expect that might be more controversial, and paired with the underscores allowed by Andreas's number syntax and hex values, it might even be ambiguous. A different separator (0@d) might fix that, of course.)
Looking at the example you pointed out:
(i32.atomic.rmw.cmpxchg
(i32.const 0) ;; lock address
(i32.const 0) ;; expected value (0 => unlocked)
(i32.const 1)) ;; replacement value (1 => locked)
...
(i32.wait
(i32.const 0) ;; lock address
(i32.const 1) ;; expected value ( 1=> locked)
(f64.const inf))
Would become:
(i32.atomic.rmw.cmpxchg 0_i32 0_i32 1_i32)
...
(i32.wait 0_i32 1_i32 inf_f64)
Assuming that inf, nan, etc. can use this suffix as well.
With proper syntax highlighting this could be nice. As it is, I find it a little noisy, considering the suffix is longer than the number. I guess that could be improved with the shorter suffixes though.
Also, isn't this already ambiguous if underscores are allowed: e.g. 0x123_f32? Seems as though we'd need a different separator, as you've already mentioned. Maybe :?
(i32.atomic.rmw.cmpxchg 0:i32 0:i32 1:i32)
...
(i32.wait 0:i32 1:i32 inf:f64)
I totally understand the desire, but is it really worth going down that
road just yet? The text format isn't meant to be written by hand at a
larger scale. The point of S-expressions is that they are simple and
regular, nodes corresponding 1-to-1 to Wasm instruction names, and
immediates are directly distinguishable from operand nodes. It's a slippery
slope starting to single out particular instructions and represent them in
some other ad-hoc form. For example, get_local contributes at least as
much to verbosity, would that be next? And why stop there? Once you start,
you quickly get into the business of designing an alternate concrete
syntax. And something like that should rather be done holistically.
Well, the text format is already more than an s-expression format, and the folded forms actually hide the instructions already (e.g. else and end). i32.const is consistently one of the most common opcodes; I think it's reasonable to consider streamlining it for clarity, just as we did with the folded form. And you're right: get_local is more common still, and having special syntax for it makes sense for me for the same reason.
I don't buy the slippery slope argument here; looking at AngryBots.wasm, 26% of the instructions are get_local and 16% are i32.const; the results are similar for bb.wasm. We could cover a lot of ground with just these two instructions (and their relatives for consistency) and stop. I agree that {get,set,tee}_local are starting to get into language design territory, but the constants seem like a pretty easy win.
Hm, I'm not sure I follow that argument. I could see us caring about code size numbers if we were talking about a transport format. But the text format isn't meant for that, and terseness isn't a goal by itself. I agree that clarity matters, but I probably disagree on whether this proposal helps or harms it from a holistic perspective: you're right that the text format isn't just S-expressions, but it names instructions 1-to-1, which this proposal would break.
I just meant that these instructions are very common, so it is valuable to a developer to optimize the representation of these instructions for viewing. In particular, if we make it part of the text format then it is legal for browser devtools to display it this way.
I don't agree that having a 1-to-1 mapping of instructions in the s-expression format is a useful property. My understanding is that the purpose of the folded s-expression format is that can be easier to read. One of the differences is that the order of evaluation changes direction, e.g.:
(call $a ;; evaluated 2nd
(call $b) ;; evaluated 1st
)
(call $c ;; 7th
(call $d ;; 6th
(i32.add ;; 5th
(i32.const 12) ;; evaluated 3rd
(call $e) ;; 4th
)
)
)
which in the linear form becomes
call $b ;; evaluated 1st
call $a ;; evaluated 2nd
i32.const 12 ;; evaluated 3rd
call $e ;; 4th
i32.add ;; 5th
call $d ;; 6th
call $c ;; 7th
I guess what I'm saying is that if you do care about representing the binary format directly, you already probably aren't using any of the text format sugars, so having syntactic sugar for our most common opcodes is completely reasonable.