threads
threads copied to clipboard
Bounds/permission checks during instantiation
https://webassembly.github.io/spec/core/exec/modules.html#instantiation
The current specification for instantiation is all or nothing - if fail occurs during the process, the store and memories are not altered.
For example, in step 10, there is a bounds check to ensure that each data segment initialiser fits within the bounds of the memory, and if any of them fail, no copying takes place at all, and the whole instantiation aborts.
This is quite a strong specification, especially now that the memory being initialised may have been imported from another thread. Conceptually, all memory bounds must be explicitly calculated and checked at the start of instantiation, so that we never get in a situation where one data segment initialiser succeeds (and can be observed by another thread) but a subsequent one fails.
This is possible because the size of the memory will only ever increase, so implementations may in practice interleave mem.grows of other threads with the initialisation, so long as the bounds were calculated at the start.
However, if mem.protect is added, this "monotonicity" of memory writability will no longer hold. There is no ahead of time check that can be made to guarantee that the memory will be writable at the instant of data segment initialisation. I'm having trouble conceptualising a sane implementation that offers this all or nothing initialisation for a shared memory in the presence of mem.protect.
What do people think should happen in this case? Does this point to a fundamental problem with all or nothing initialisation?
My hot take: disallow data segment initialisers ("active" data segments) for shared memories at the validation level - instead force people to use the bulk memory proposal's instructions in the start function. This would make initialisation order, ahead of time bounds checking (if desired), and trap behaviour explicit, and probably avoids more gotchas down the road.
That's an excellent point. I think with the new bulk operators it makes total sense to treat active segments as a legacy feature that cannot be combined with any new functionality like shared memories.
Just to totally enumerate the space of possibilities:
-
The solution above - shared memories can't have active segments.
-
Change the semantics of instantiation so that initialisers just become a shorthand for sequentially executed bulk operations, immediately before the start function, without an ahead of time bounds check. This results in the neatest spec of all the options, but could be a breaking change since a failed bounds check could result in some initialisers applied to memory but not others.
-
Initialisation still does an ahead of time bounds check, but does not guarantee all-or-nothingness in the presence of mem.protect. I don't like this one since it creates a weird edge in what instantiation guarantees.
If I had total freedom, I'd lean towards (2). (1) is a good compromise that doesn't change any existing semantics.
I'm actually surprised that this bounds checking is in the spec, since an imported memory already has a declared minimum size in its type, right? The only way you could observe a different behaviour with (2) would be if you chose to give your memory a type which is too small for your data initialisers (which I guess could be happening in autogenerated code).
The other way in which the segment bounds check can fail is if the offset given in the segment is determined by an imported global, and that is somehow off.
I would be fine with option 2 as well, though others might not. I agree that option 3 sounds rather dirty.
It looks like option 2 lines up with @jfbastien's position in a prior discussion: https://github.com/WebAssembly/spec/pull/399 https://github.com/WebAssembly/design/pull/902
Is this something we can change without breaking the web?
EDIT: trying to collate previous discussions on this: https://github.com/WebAssembly/design/issues/889 https://github.com/WebAssembly/design/issues/897
More generally, we may end up running into other issues that force us to abandon atomicity. (Even today, the start function can fail mid-way through...) So maybe (2) is just better since it's simpler.
Is this something we can change without breaking the web?
I think option 2 should be safe since it's just loosening a restriction on a module that would otherwise trap. It could theoretically break someone who was expecting to trap in that case, but that seems unlikely.
I'm OK with option 1 or 2, with a slight bias to option 1.
If there are no objections, I will write a tentative PR against the main spec for option 2, and we can have some concrete discussions there.