wabt Add some optional defines to disable wasm2c trap checks

WASM_RT_NO_MEMORY_CHECKS: Prevents checks from being emitted for memory accesses being in bounds.
WASM_RT_NO_STACK_DEPTH_CHECKS: Prevents checks from being emitted for exhausting the stack.

When compiling a commandline tool like say wasm-opt, those are not relevant, and they do add significant overhead both to binary size and to speed.

Thoughts?

May 18 '20 22:05 kripken

Can you say more about the motivation for adding WASM_RT_NO_MEMORY_CHECKS? A core part of the WebAssembly ecosystem is that the sandbox is always enabled (outside of debugging an implementation or experimentation). When the sandbox doesn't work or is too slow, we work to find ways to make the sandboxing better rather than disabling it.

May 20 '20 01:05 sunfishcode

Oh, I wrote this in the other PR just now, but here is the reason: If you compile with wasm2c into a native executable, and then run it from the commandline, the OS will have it sandboxed in a process anyhow. There won't be anything else in the process with it to corrupt (except maybe the little "runtime" in main.c).

Running the emscripten benchmark suite with wasm2c, these checks make a big difference. Often wasm2c is slower than wasm, but faster than wasm and often comparable to a normal native build, when not doing these checks.

Obviously this should not be on by default! And obviously for a library (and not a commandline utility) it would be a bad idea.

May 20 '20 02:05 kripken

A wasm program is running without linear memory sandboxing can read and write to around 8 GiB of host process address space. Depending on the address space layout, this could be enough to do something which triggers a ROP, and in that case the program can probably break out of a typical OS process.

The WebAssembly design had several choices as to the behavior of out-of-bounds linear memory accesses, including masking/wrapping and nondeterministic behavior, and deliberately chose deterministic trapping, even with the awareness that some implementations would have overhead. If this has turned out to be too burdensome for important implementations, it would be appropriate to take the data to the CG.

I look to WebAssembly-organization projects such as wabt to lead by example in following the spec. There are many parts of the spec which could be ignored to achieve significant benchmark wins on some implementations today. But that's not WebAssembly.

May 20 '20 05:05 sunfishcode

I definitely agree with you in general @sunfishcode . However, the use case I have here (as mentioned in #1434) is compiling trusted code using wasm2c, where a ROP isn't relevant.

I agree that that is not a standard WebAssembly use case! But wasm toolchain components, like wasm2c here, are potentially useful for more than the standard things, and optional support for those would be cool.

May 20 '20 15:05 kripken

I agree that a better solution (at least for wasm-opt) in general would be to provide a cheaper memory sandbox (a la trap handling or the like). It should be possible to do that in wasm2c at some point.

That said, I have experimented with disabling these checks before too (for performance analysis), so I can see the value. We certainly shouldn't encourage their use for most programs, though I wonder how much worse the security is compared to a native executable.

@sunfishcode Perhaps as a compromise we should avoid mentioning these defines in the docs? Or maybe change their names UNSAFE_DISABLE_MEMORY_CHECKS etc.?

May 20 '20 19:05 binji

I agree that for the specific use case @kripken describes here where the program is trusted anyway, security is not a concern. And I agree that performance experimentation is cool, though it's pretty easy for people to experiment by editing the code.

I don't think a project as prominent as wabt should have options that intentionally violate WebAssembly standards, for the purpose of allowing third-party projects to depend on them.

May 21 '20 22:05 sunfishcode

@sunfishcode OK, but what it is the path forward here? I guess you could say that for @kripken's use case, it's up to them to run a sed script over the output, or modify the output manually. That feels like a worse outcome, but maybe it's OK for now.

Let's do this -- for now, let's hold off on this PR. I'm interested in adding trap-handling to wasm2c anyway (it'll be useful for experimenting with memory64 sandboxing too), which will make this PR moot. I've added an issue for this here: https://github.com/WebAssembly/wabt/issues/1440

May 26 '20 20:05 binji

Trap handling will help a lot in many cases, but I don't think it would make this PR moot. There are places where trap handlers aren't an option for various reasons (32-bit systems, places without full OS permissions).

May 26 '20 20:05 kripken

@kripken Good point. Here's another slightly less opinionated option -- we change the various defines (MEMCHECK, FUNC_PROLOGUE) to check whether they're already defined. In practice this is not much different, but it is less like an official option provided by wasm2c this way.

May 26 '20 21:05 binji

I'd be ok with checking for the defines. Sounds ok for now.

But in the long term I think we should resolve this debate. For example, it would be interesting if wasm2wat could turn

boundscheck(x)
load(x)
..no branches..
boundscheck(x + 4)
load(x + 4)
..no branches..
boundscheck(x + 8)
load(x + 8)

into

boundscheck(x + 8)
load(x)
..no branches..
load(x + 4)
..no branches..
load(x + 8)

which saves 2 checks. This changes wasm semantics slightly since we can trap earlier (for that reason I believe C compilers can't do it automatically). This could be a large optional speedup which we can't do with a previous #define. It would be unfortunate if we couldn't do this kind of thing in wasm2wat and had to resort to some kind of post-processing tool.

May 27 '20 15:05 kripken

I appreciate the creativity, but ultimately, checking whether MEMCHECK is already defined would still be setting up an extension point for the purpose of letting downstream dependencies alter wasm spec requirements.

(If a general-purpose tracing or debugging extension point is desired, such a thing should be designed to be independent of the sandboxing mechanism anyway.)

An expectation of spec conformance is one of the tools the WebAssembly community has for pressuring everyone, even powerful companies, to play by the same rules, and while I recognize it has practical limitations, I am opposed to weakening it.

May 27 '20 18:05 sunfishcode

@sunfishcode

I think it would be useful for you to explain your position some more - it's a novel one that I've never encountered before. Perhaps I am misunderstanding you.

To avoid confusion, here is my position: any wasm VM should conform to the wasm spec, period. Toolchain components that consume and produce wasm should do so according to the spec, period. However, a toolchain may also do more things. For example, wabt today can read wasm and emit a C-like format using wasm-decompile, and binaryen can read wasm and emit asm.js-like JS using wasm2js. Those output formats are not wasm.

wasm-decompile doesn't emit runnable code, but wasm2js does, and it does not 100% conform to wasm semantics, intentionally, because that would not work - bounds checking and alignment checking each load and store would be too slow when using JS Typed Arrays. This is ok because wasm2js output is not wasm, it's JS.

The proposed wasm2c stuff in this PR is directly analogous: it's not wasm, it's C.

(If you disagree, do you also disagree on wasm-decompile and/or wasm2js?)

May 27 '20 23:05 kripken