webassemblyjs wasm-parser/wasm-gen or wast-parser/wast-printer incorrectly handle "end" instruction in globals initialization

If I take the following .wat:

(module
  (global i32 (i32.const 0))
)

...and then

compile it with wat2wasm,
parse it to an AST with wasm-parser
and print it back to text format with wast-printer,

then I get the following, modified .wat code back:

(module
  (global i32 (i32.const 0)(end))
)

Notice the weird added (end) statement, which makes this code invalid (wat2wasm doesn't compile it).

Here is an end-to-end reproduction where I typed in the generated wasm by hand and also commented it. The wasm itself seems to be spec-compliant.

import {decode} from '@webassemblyjs/wasm-parser';
import {print} from '@webassemblyjs/wast-printer';

// prettier-ignore
let wasmBytes = new Uint8Array([
  0x00, 0x61, 0x73, 0x6d, // magic header
  0x01, 0x00, 0x00, 0x00, // version
  0x06, 0x06, 0x01, // globals section with 6 bytes, 1 global
  // GLOBAL
    0x7f, // type i32
    0x00, // not mutable
    // initialiation
    0x41, 0x00, // i32.const 0
    0x0b, // end of initialization
]);

// decode with wasm-parser
let ast = decode(wasmBytes, {});

// print with wast-printer
console.log(print(ast));

/* OUTPUT:
(module
  (global i32 (i32.const 0)(end))
)
*/

I'm not 100% sure what goes wrong here, but looking at the AST, I noticed that it also contains an explicit "end" instruction node. Here is the value of ast.body[0].fields[0] in the code above:

{
  type: 'Global',
  globalType: { type: 'GlobalType', valtype: 'i32', mutability: 'const' },
  init: [
    {
      type: 'Instr',
      id: 'const',
      args: [Array],
      object: 'i32',
      loc: [Object]
    },
    { type: 'Instr', id: 'end', args: [], loc: [Object] } // <--- IS THIS INTENDED?
  ],
  name: { type: 'Identifier', value: 'global_0', raw: '' },
  loc: { start: { line: 3, column: 5 }, end: { line: 3, column: 28 } }
 }

This AST is different from the one which is produced when I take my original correct wat and parse it with this repo's wast-parser. In that case I don't get the end instruction:

import {parse} from '@webassemblyjs/wast-parser';
let ast = parse('(module (global i32 (i32.const 0)))');
console.log(ast.body[0].fields[0]);

{
  type: 'Global',
  globalType: { type: 'GlobalType', valtype: 'i32', mutability: 'const' },
  init: [
    {
      type: 'Instr',
      id: 'const',
      args: [Array],
      object: 'i32',
      loc: [Object]
    }
    // SHOULD THERE BE AND "END" HERE?
  ],
  name: { type: 'Identifier', value: 'global_0', raw: '' },
  loc: { start: { line: 3, column: 5 }, end: { line: 3, column: 28 } }
}

In other words, parse(...wat...) creates a different Global node than wat2wasm followed by decode(...wasm...).

As an additional data point, if I take this repo's wasm-gen and encode the Global node above (the one without the end instruction), the generated wasm bytes don't match with the one from the original (correct) wasm:

import {parse} from '@webassemblyjs/wast-parser';
import {encodeNode} from '@webassemblyjs/wasm-gen';
let ast = parse('(module (global i32 (i32.const 0)))');
console.log(encodeNode(ast.body[0].fields[0]));

This results in the following bytes for the global:

0x7f 0x00 0x41 0x00

While in the original wasm it had an additional byte at the end, the 0x0b which is the end instruction:

0x7f 0x00 0x41 0x00 0x0b

According to the spec, "each global is initialized with an init value given by a constant initializer expression" 1, and expressions "are sequences of instructions terminated by an end marker." 2.

So that 0x0b / end byte is definitely supposed to be there! That implies one of the following:

Either the AST is supposed to feature that end instruction in the global initialization. That would mean that both wast-parser creates the wrong AST, and wast-printer takes a correct AST (with the end instruction) and turn it into wrong wat (with an unnecessary (end) token).
Or the AST format should not feature the end instruction (i.e. the instruction could be implicit). That would mean that wasm-parser creates the wrong AST, and wasm-gen takes a correct AST and turns it into wrong bytecode.

To summarize, either the wast pipeline or the wasm pipeline is treating the end byte in the init part of a global declaration incorrectly :)

Thanks for this great library, btw! :heart:

Aug 20 '21 13:08 mitschabaude

I want to add that I'd be happy to dig into the code and make a PR that fixes this, however I don't know whether it's the wast pipeline or the wasm pipeline that should be fixed.

IMO, the end instruction should show up in the AST like every other instruction (even if it isn't part of the text format), which would mean that the wasm pipeline is fine and the wast pipeline should be fixed. In any case, I'm pretty sure that these two pipelines should be able to work together and be consistent in what kind of AST they create/assume.

Aug 23 '21 14:08 mitschabaude

Thanks for the the detailed report. At first glace I think the wast printer needs to changed to not print the end instruction where's not legal.

Aug 28 '21 19:08 xtuc

@xtuc another thing: I also noticed that the wasm-parser -> wast-printer pipeline generates wast code with "u32" and "u64" instead of "i32", "i64", e.g. the code will contain stuff like "u32.const" which then cannot be parsed by other tools.

If you want I could try to fix this as well in wast-printer in one PR together with the (end) issue

Aug 30 '21 15:08 mitschabaude

It would be nice if you can fix that inconsistency as well, but please in a sperated PR, that's going to be easier to review.

Aug 30 '21 20:08 xtuc