wabt
wabt copied to clipboard
[wasm-validate] Wrong binary location reported in error message
Hi,
I believe that wasm-validate
reports the wrong binary location in its error messages.
Example
Validating following file generated via wast2json
from the spec test i32.wast
$ xxd test/core/wasm/i32/i32.wast.8.wasm
00000000: 0061 736d 0100 0000 0104 0160 0000 0302 .asm.......`....
00000010: 0100 0a0e 010c 0041 0002 4045 0e00 001a .......A..@E....
00000020: 0b0b ..
$ wasm-objdump -d test/core/wasm/i32/i32.wast.8.wasm
i32.wast.8.wasm: file format wasm 0x1
Code Disassembly:
000016 func[0]:
000017: 41 00 | i32.const 0
000019: 02 40 | block
00001b: 45 | i32.eqz
00001c: 0e 00 00 | br_table
00001f: 1a | drop
000020: 0b | end
000021: 0b | end
with the wasm-validate
program results in following error message:
$ wasm-validate test/core/wasm/i32/i32.wast.8.wasm
test/core/wasm/i32/i32.wast.8.wasm:000001c: error: type mismatch in i32.eqz, expected [i32] but got []
test/core/wasm/i32/i32.wast.8.wasm:0000021: error: type mismatch in function, expected [] but got [i32]
However, as can be seen in the object dump above the i32.eqz
instruction (hex opcode 45
) is at offset 000001b
, not 000001c
as stated in the error message. Interestingly, the second error message displays the correct binary location.
Expected Output
To clarify, I would have expected following output:
$ wasm-validate test/core/wasm/i32/i32.wast.8.wasm
test/core/wasm/i32/i32.wast.8.wasm:000001b: error: type mismatch in i32.eqz, expected [i32] but got []
test/core/wasm/i32/i32.wast.8.wasm:0000021: error: type mismatch in function, expected [] but got [i32]
Additional Information
The spec interpreter reports the correct binary location:
$ wasm test/core/wasm/i32/i32.wast.8.wasm
../spec/core/wasm/i32/i32.wast.8.wasm:0x1b: invalid module: type mismatch: operator requires [i32] but stack has []
Thanks for reporting this. It's tedious to get the correct offsets, but it is worth doing :)
This seems to be because the reported offset for errors is at the end of the decoded instruction. Looking at fixing this, I see two options:
- Keep track of an additional offset (for the beginning of the current instruction) and use that for error reporting.
- Calculate the correct offset by taking the "end-of-position" and working backwards.
(1) has some amount of constant overhead, while (2) introduces a bit of complexity (and likely this will not be the last bug about error offsets). Curious for @binji's thoughts about whether (1) is acceptable or if (2) is likely to be simpler than I imagine.
(1) seems easier to me, and I bet the constant overhead is small. (2) requires some subtlety, because it's easy in most cases (given an instruction, we nearly always know the immediates expected). However, the size of the immediates is not easy to determine. As a simple example, local.get
always has one index immediate. But that immediate can be between 1 and 5 bytes. Not only that, but it's valid for LEB128 values to be encoded non-minimally, meaning you can't easily determine the encoded size from the value.
I am intending to use the wabt repo as a disassembler plugin and I am experiencing this exact issue. I will implement a version which suits my own purposes but it would be great to have this properly supported in the future. I can share my adaption when I'm done, but I don't know if what you have in mind is what I will be doing.