atomic.fence validation and binary encoding
After the last CG, we agreed to add atomic.fence.
Validation-wise, the instruction has type ([] -> []). Unlike other atomic.* instructions, it is not a validation error for the instruction to occur in a module with a non-shared memory. This is because the instruction fences all memories in the store, not just that of the current module.
Encoding-wise, I hope someone with better intuition about the binary format can suggest something.
I don't quite understand why fence is allowed to be used with non-shared memories but other atomic instructions are not. Can you elaborate on that a bit?
The C, C++ and Rust fence constructs have an "order" parameter. The current wasm threads proposal only supports SC, but other orderings may be added in the future. Would it make sense to include an immediate field in the fence encoding, to allow for other orderings to be specified in the future?
@tlively, a fence is not tied to any memory, but commits all accesses (which may involve multiple memories or even other state than memories in the future) that have occurred in the thread (which may include accesses perfomed in other modules on the same thread). So it isn't meaningful to require a specific memory to be present locally.
The C, C++ and Rust fence constructs have an "order" parameter. The current wasm threads proposal only supports SC, but other orderings may be added in the future. Would it make sense to include an immediate field in the fence encoding, to allow for other orderings to be specified in the future?
I think this makes sense. The same point holds for other atomic operations - any order encoding for fence should try to be consistent with what they are/will be doing. I can't see anything explicit, but is the memarg part of the existing atomics' binary encoding intended to be used this way in the future? (https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#spec-changes)
I can't see anything explicit, but is the memarg part of the existing atomics' binary encoding intended to be used this way in the future?
Yes, I assumed we would use the memarg immediate for other orderings in the future. We may want to use a single 0 byte for atomic.fence instead of memarg, since the alignment/offset doesn't make sense here, though
@rossberg, I buy that a fence commits all accesses, even on a single thread, and I think that is a good argument for allowing it in modules without shared memories. But by extension of the same reasoning shouldn't all other atomic operations be allowed in these modules as well? They could still have their ordering guarantees on a single thread, they just wouldn't be as useful.
Also, I still don't quite see how a fence on a single thread would be useful. Can you give me an example where a single-threaded program could be observably different if it included a fence?
@tlively, all other atomic operators operate on a specific memory -- via the memory index 0 that is currently implicit in the instruction but may be explicit once we allow multiple memories. So memory 0 needs to exist in the module.
A fence doesn't have any effect in a single-threaded program. However, my point was that it still has an effect on everything in the same thread in a multi-threaded program. Consider:
(module $A
(memory (import "" "mem") 1 shared)
(func (export "rd") (param $i i32) (result i32)
(i32.load8_u (local.get $i))
)
(func (export "wr") (param $i i32)
(i32.store8 (i32.const 0xff) (local.get $i))
)
)
(module $B
(func $rd (import "A" "rd") (param i32))
(func $wr (import "A" "wr") (param i32))
(func $run (param $i i32) (result i32)
(call $wr (local.get $i))
(atomic.fence)
(call $rd (i32.eqz (local.get $i)))
)
)
Imagine you instantiate both $A and $B twice in different threads, but sharing the same zeroed memory. And in one you invoke (call $run (i32.const 0)) and in the other (call $run (i32.const 1)). Despite not seeing any memory in scope, the fence guarantees that they cannot both return 0 (cf. the running example in Conrad's slides).
Just to make @rossberg's example completely tight, the load and store in module $A should be release/acquire atomics.
I think the disconnect here is that not declaring a shared memory doesn't imply that the module is only going to be used in a single-threaded context - that just happens to be (kind of) true for the instructions we've previously defined.
I can't see anything explicit, but is the memarg part of the existing atomics' binary encoding intended to be used this way in the future?
Yes, I assumed we would use the memarg immediate for other orderings in the future. We may want to use a single 0 byte for
atomic.fenceinstead of memarg, since the alignment/offset doesn't make sense here, though
@binji would a reasonable encoding be something like
memargf ::= 0x00
atomic.fence ::= 0xFE 0x0F m:memargf
to keep the convention that (only) atomic memory accesses have a non-zero first nibble, but thematically positioning fence as close as possible to them (and leaving no gap)?
@conrad-watt that looks right to me, though I think we'd just inline the memargf (as with call_indirect in this definition). As for the opcode to choose, I don't have a strong opinion. I was thinking it would be 0x03, but not for any good reason, really.