threads atomic.fence validation and binary encoding

After the last CG, we agreed to add atomic.fence.

Validation-wise, the instruction has type ([] -> []). Unlike other atomic.* instructions, it is not a validation error for the instruction to occur in a module with a non-shared memory. This is because the instruction fences all memories in the store, not just that of the current module.

Encoding-wise, I hope someone with better intuition about the binary format can suggest something.

Jun 16 '19 16:06 conrad-watt

I don't quite understand why fence is allowed to be used with non-shared memories but other atomic instructions are not. Can you elaborate on that a bit?

Jun 16 '19 17:06 tlively

The C, C++ and Rust fence constructs have an "order" parameter. The current wasm threads proposal only supports SC, but other orderings may be added in the future. Would it make sense to include an immediate field in the fence encoding, to allow for other orderings to be specified in the future?

Jun 16 '19 17:06 sunfishcode

@tlively, a fence is not tied to any memory, but commits all accesses (which may involve multiple memories or even other state than memories in the future) that have occurred in the thread (which may include accesses perfomed in other modules on the same thread). So it isn't meaningful to require a specific memory to be present locally.

Jun 16 '19 18:06 rossberg

The C, C++ and Rust fence constructs have an "order" parameter. The current wasm threads proposal only supports SC, but other orderings may be added in the future. Would it make sense to include an immediate field in the fence encoding, to allow for other orderings to be specified in the future?

I think this makes sense. The same point holds for other atomic operations - any order encoding for fence should try to be consistent with what they are/will be doing. I can't see anything explicit, but is the memarg part of the existing atomics' binary encoding intended to be used this way in the future? (https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#spec-changes)

Jun 16 '19 19:06 conrad-watt

I can't see anything explicit, but is the memarg part of the existing atomics' binary encoding intended to be used this way in the future?

Yes, I assumed we would use the memarg immediate for other orderings in the future. We may want to use a single 0 byte for atomic.fence instead of memarg, since the alignment/offset doesn't make sense here, though

Jun 17 '19 00:06 binji

@rossberg, I buy that a fence commits all accesses, even on a single thread, and I think that is a good argument for allowing it in modules without shared memories. But by extension of the same reasoning shouldn't all other atomic operations be allowed in these modules as well? They could still have their ordering guarantees on a single thread, they just wouldn't be as useful.

Also, I still don't quite see how a fence on a single thread would be useful. Can you give me an example where a single-threaded program could be observably different if it included a fence?

Jun 17 '19 01:06 tlively

@tlively, all other atomic operators operate on a specific memory -- via the memory index 0 that is currently implicit in the instruction but may be explicit once we allow multiple memories. So memory 0 needs to exist in the module.

A fence doesn't have any effect in a single-threaded program. However, my point was that it still has an effect on everything in the same thread in a multi-threaded program. Consider:

(module $A
  (memory (import "" "mem") 1 shared)
  (func (export "rd") (param $i i32) (result i32)
    (i32.load8_u (local.get $i))
  )
  (func (export "wr") (param $i i32)
    (i32.store8 (i32.const 0xff) (local.get $i))
  )
)

(module $B
  (func $rd (import "A" "rd") (param i32))
  (func $wr (import "A" "wr") (param i32))
  (func $run (param $i i32) (result i32)
    (call $wr (local.get $i))
    (atomic.fence)
    (call $rd (i32.eqz (local.get $i)))
  )
)

Imagine you instantiate both $A and $B twice in different threads, but sharing the same zeroed memory. And in one you invoke (call $run (i32.const 0)) and in the other (call $run (i32.const 1)). Despite not seeing any memory in scope, the fence guarantees that they cannot both return 0 (cf. the running example in Conrad's slides).

Jun 17 '19 07:06 rossberg

Just to make @rossberg's example completely tight, the load and store in module $A should be release/acquire atomics.

I think the disconnect here is that not declaring a shared memory doesn't imply that the module is only going to be used in a single-threaded context - that just happens to be (kind of) true for the instructions we've previously defined.

Jun 17 '19 10:06 conrad-watt

I can't see anything explicit, but is the memarg part of the existing atomics' binary encoding intended to be used this way in the future?

Yes, I assumed we would use the memarg immediate for other orderings in the future. We may want to use a single 0 byte for atomic.fence instead of memarg, since the alignment/offset doesn't make sense here, though

@binji would a reasonable encoding be something like

memargf ::= 0x00

atomic.fence ::= 0xFE 0x0F m:memargf

to keep the convention that (only) atomic memory accesses have a non-zero first nibble, but thematically positioning fence as close as possible to them (and leaving no gap)?

Jun 17 '19 10:06 conrad-watt

@conrad-watt that looks right to me, though I think we'd just inline the memargf (as with call_indirect in this definition). As for the opcode to choose, I don't have a strong opinion. I was thinking it would be 0x03, but not for any good reason, really.

Jun 17 '19 18:06 binji