[WIP] Lower memory.copy
@kripken I haven't actually tested this for real yet, but can you give it a quick look-over for general sanity? I've not got a lot of experience using these APIs.
Hmph. I don't know if I agree with clang-format on the best way to format this, but I guess it's best to keep everything conforming :D
Ah, so here is where we have to decide how general-purpose we want the pass to be. If this is just for lowering away LLVM-produced memcpy and supporting older browsers, then we don't need to support multi-memory, memory64, or threads (because browsers that support those always support bulk memory). Not supporting threads means that the order of byte copies doesn't have to match the spec, because it's not observable. If we further assume that clang is the source of the memory ops, then we don't have to worry about pointer overflow because it's UB. For now I'm pretty happy with the former assumption because supporting more feature interactions would add a fair amount of complexity for no known use case. The latter I'm less sure about. I suppose it's possible we'd be running this lowering over mixed C+Rust code?
Those sound like reasonable assumptions to me.
But IIANM the order of copying matters when the ranges overlap, too:
[----------]
[----------]
Copying the former over the latter in the forward direction is fine, but in reverse will smear.
But maybe LLVM never emits a memory.copy where this might matter? Does it lower llvm.memcpy and/or llvm.memmove?
Btw, an issue I realized with limiting these passes to LLVM's output is that we can't run the spec tests on them or fuzz them, not normally at least.
Ah, good point about the order. LLVM does lower both to memory.copy (see WebAssemblySelectionDAGInfo.cpp). Yeah, my way of testing was just to run the passes unconditionally and send them through all of the emscripten tests. But that's probably not as good as including spec tests or fuzzing.
Btw, an issue I realized with limiting these passes to LLVM's output is that we can't run the spec tests on them or fuzz them, not normally at least.
I realized that another way to implement the trapping behavior inside wasm32 would be to extend the addresses to i64 and do the trap check in i64. Obviously it would add more complexity; do you think that would be worth it?
I don't follow, how would extending to i64 help here? Would we be able to check things we don't currently (what)?
Sorry, I was thinking of the comments above about possible overflow when checking for out of bounds trapping. Currently we don't handle that because UB, but if we wanted to run spec tests or fuzz, we'd presumably need to handle traps that also overflow the i32 range.
I see. We can also test for that overflow in 32-bit (x + y < x implies overflow if both are unsigned), but I don't know what's simpler.
I'm also not sure if full fuzzing and spec tests are worth it. We can get some fuzzing through emcc (using csmith to fuzz from C code).
Yeah my thinking was that extending to i64 would be simpler than adding the extra condition you mentioned for both src and dst. But I'm also leaning toward not bothering with the spec tests. I'm running all the emscripten tests with the pass enabled (and I'm considering adding another test mode that builds for MVP wasm which would have them enabled continuously? WDYT about that?) To get extra testing beyond the emscripten tests, I'm also going to run them with the LLVM testsuite tests.
OK, assuming we keep the current trapping behavior, and the current structure (i.e. of generating rather than merging the polyfill functions) I think this is ready.
I'm running all the emscripten tests with the pass enabled (and I'm considering adding another test mode that builds for MVP wasm which would have them enabled continuously? WDYT about that?) To get extra testing beyond the emscripten tests, I'm also going to run them with the LLVM testsuite tests.
I like the idea to add an MVP test mode. Off on CircleCI, just running on the testsuite bot?
I like the idea to add an MVP test mode. Off on CircleCI, just running on the testsuite bot?
Yeah, along with the ASan and similar modes.
I think we could add some execution tests for this, under
test/lit/exec/. The tests can contain some copy/fill operations, and print memory contents after the operation. By running them with--lower --fuzz-execwe would verify that the output is unchanged after the lowering.
I'm trying this out, but I'm not sure how to print the memory. The execution engine in wasm-opt doesn't seem to have the same magic spectest.print imports that wasm-shell has.
You can call the logging imports, see
https://github.com/WebAssembly/binaryen/blob/52cae1142bb8c06c98c4887ef9ffefff70dbeefc/test/lit/exec/fuzzing-api.wast#L25-L35
and their definitions earlier in that file.
Thanks, WDYT of the memory copy test? It's derived from the wast spectest but turned into a fuzz-exec test.
Test looks great! lgtm with the same for memory.fill.