threads
threads copied to clipboard
Should wake trap for out-of-bounds addresses?
Imagine that a memory.shrink
operation is added, and that a thread waits on an address, then the memory is shrunk so that the address is out of bounds. If wake traps on out-of-bounds addresses (as it is defined to do now), it's not possible to wake the waiting thread without first growing the memory back to include the wait address.
Given that wake doesn't use its address operand to access memory, I think it shouldn't do any bounds checking on it.
I think the reasonableness of this proposal depends in part on the underlying behaviour of OS functions like wait_on_bit, wake_up_bit and so on. If any implementation naturally would trap on invalid addresses, this proposal would mean additional runtime checks rather than fewer.
A quick browse of the linux source code leads me to believe that the wake function above wouldn't trap, but I'm not an expert, and I also don't know about the behaviour of other OS's.
Aside from this, in the example you give, the wait operation must be racing with the shrink, so on some executions the wait will trap, and whatever write event it is waiting on could also trap in the presence of the shrink. There's a value in keeping the trapping behaviour more uniform across different interleavings, even if it means an extra bounds check in wake (which could(???) be dwarfed by the cost of the OS calls/machinery required for the wake operation anyway).
That's the specification point of view anyway. I think this needs insight from implementers. @jfbastien do you have any thoughts?
I think the reasonableness of this proposal depends in part on the underlying behaviour of OS functions
There's an old thread where I raised the issue that wait/wake don't seem to map to the OS functions. I don't remember the details, but I think the OS functions I looked at (both WaitOnAddress
/WakeByAddress
on Windows, and futex
on Linux) allow spurious wakes where SAB/WebAssembly do not.
Aside from this, in the example you give, the wait operation must be racing with the shrink, so on some executions the wait will trap, and whatever write event it is waiting on could also trap in the presence of the shrink (unless the wait is redundant). There's a value in keeping the trapping behaviour more uniform across different interleavings, even if it means an extra bounds check in wake (which could(???) be dwarfed by the cost of the OS calls/machinery required for the wake operation anyway).
If I understand correctly, I think what you'd actually want to avoid inconsistent trapping if the wait is racing a shrink is for the waiting thread to trap on wake, and for a memory shrink to wake and trigger that trap in any waiters on the pages being removed by the shrink.
The motivation of this thread was actually that I noticed a race in my implementation of atomic.wake
where bounds-checking the address and the wake operation are not atomic, so it will trap inconsistently if it's racing an operation that changes the memory size.
However, I'm now realizing that if the ability to mmap
a shared buffer is added, then this is more complicated. You don't want to wake the virtual address in your memory, you want to wake the physical address that it maps to, so you can wake a thread waiting on the same physical address via a different virtual address. To be clear, I'm not referring to the hardware-level virtual and physical addresses, only making an analogy to them. But it means that wake
needs to map the virtual address to physical address, so it makes sense for wake
to trap on out-of-bounds addresses.
My concerns here would be:
- Runtime cost
- Surprising behavior in corner cases
In general, wake / wait isn't fast-path so a bounds check isn't really expensive.
I don't see either proposed approach as being particularly surprising. It's an extreme corner case to wait on an address and then shrink it out, and any behavior seems fine. We could also decide that shrink has to trap if a waiter would go out-of-bounds, but that adds implementation complexity for very little upside (because doing that check in a non-racy manner is tricky).
I agree there's some degree of raciness in the example Andrew provided.
Similarly, I'd consider what support for memory protection would do: that's not just shrinking. What do we do if you wake / wait on a no-read and no-write page? We should do the same as a shrunk-out address IMO.
So maybe wake / wait should trap on OOB from the max size, but not trap otherwise?
I think there are two consistent views here.
-
The address used by wait/wake, after wait's read has been performed, is interpreted as merely an index/hash into a table/queue of waiting threads, and doesn't have anything to do with accessing memory. In this situation, protect and shrink will cause a wait's read to trap, but aside from this, there's no concept of OOB/protection. ~~I think this is what linux is doing in the example above.~~
-
Something along the lines of an abstract capability machine, where ability to wait/wake is equated with permission/ability to read/write (respectively, and in terms of both bounds-checks and protection), at the moment of instruction execution. So a thread can wait on an address which is then protected/shrunk, and will stay waiting, and any wake on that address will trap. ~~If the memory is later unprotected/grown, a wake on this address will wake the waiting thread.~~ EDIT: @AndrewScheidecker makes a good point below.
I don't have any opinion on which one is better. 1 is probably ~~closer to reality and~~ more easily implementable, 2 is maybe conceptually slightly less surprising.
EDIT: JS does 2, since its Atomics.notify is spec'd as using ValidateAtomicAccess. But I don't think it's overly important to align here.
I started this issue to suggest switching from option 2 to 1, but now I think it would be best to keep option 2.
If something like mmap
is added then it would be natural to redefine wait/wake to be keyed by an abstract storage address rather than the address used to access a specific linear memory. That implies that both wait and wake need to map the linear address to a storage address, so it makes sense to trap when waking unmapped/protected/out-of-bounds addresses.
Redefining wait/wake to be keyed by an abstract storage address would also mean that shrinking/growing a memory would be defined in terms of unmapping the pages at the end, then remapping them to new abstract storage addresses. If you were to wait on some linear address in a page, then unmap the page, then remap it with a new abstract storage address, then a wake to the same linear address should not match the wait.
I think mmap
is still a pretty speculative WebAssembly extension (more so than memory.shrink
or memory.protect
), but I think the potential interactions with wait/wake are a pretty strong argument to stay with option 2 for now.
My next question is: should shrinking and growing a memory imitate a future WebAssembly with mmap
where the same linear address does not wake a thread that waited on the page before it was shrunk away and then regrown?
Ah, I didn't previously completely get your point about the distinction between virtual and physical memory in this scenario, but I think I understand now.
My impression would be that either option biases against particular implementations, but since I'd expect memory.shrink to only be called in truly exceptional circumstances that warrant a genuine unmapping behind the scenes, it probably makes sense to not allow the wake in your last scenario.
My next question is: should shrinking and growing a memory imitate a future WebAssembly with mmap where the same linear address does not wake a thread that waited on the page before it was shrunk away and then regrown?
If there's a wait on an address that's going away and we make it impossible to regain access to that address (because the next time we grow the memory past it we get a different instance of a memory location at that address, notionally), then we have to decide what happens with the waiting threads. It's somewhat consistent with current reality to wake those threads when we shrink the memory, but we'd have to wake them with a result code that we don't currently have (some type of cancellation).
An alternative might be to fail the shrink/mmap/mprotect if there are outstanding waits on any address being unmapped. This is unconventional but consistent, I think. It may make porting software harder. But, as JF says, these are extreme edge cases anyway.
FWIW, I also think we should trap as per normal if a wake targets a nonexistent/unmapped address, again for consistency's sake.
It's somewhat consistent with current reality to wake those threads when we shrink the memory, but we'd have to wake them with a result code that we don't currently have (some type of cancellation).
An alternative might be to fail the shrink/mmap/mprotect if there are outstanding waits on any address being unmapped.
Is there a reason not to just orphan those threads as unwakeable?
Is there a reason not to just orphan those threads as unwakeable?
The more we try to be "nice", the more complex it'll be. Checking whether an address is mapped is inherently racy, and making it non-racy would slow down the part that matters. All that for a silly use case being nicer.
So maybe we can do something nice, but only if it's really easy and not racy.
Is there a reason not to just orphan those threads as unwakeable?
It's sort of nasty to do so, though of course JF is right about the complexity and costs of trying to be nice.
I wanted to briefly follow up on something @conrad-watt said since it might lead to confusion if not clarified.
- Something along the lines of an abstract capability machine, where ability to wait/wake is equated with permission/ability to read/write (respectively, and in terms of both bounds-checks and protection), at the moment of instruction execution. So a thread can wait on an address which is then protected/shrunk, and will stay waiting, and any wake on that address will trap. If the memory is later unprotected/grown, a wake on this address will wake the waiting thread. ... JS does 2, since its Atomics.notify is spec'd as using ValidateAtomicAccess. But I don't think it's overly important to align here.
JS does not do 2 in any interesting sense because shared memory cannot be shrunk in JS. Nonshared memory can become detached and thus unavailable, but shared memory in JS is (at the moment) fixed size. So looking to JS here is probably not fruitful.
What we do with our memories will of course impact JS if JS can obtain handles to those memories, as it can currently, and we probably can't avoid that in the future either - any program that shares a memory between multiple modules will need a memory that JS will be able to observe. So aligning with JS is likely inevitable.
Thanks, I was definitely making some unjustified assumptions about how ValidateAtomicAccess would be extended in these cases.
any program that shares a memory between multiple modules will need a memory that JS will be able to observe. So aligning with JS is likely inevitable.
Perhaps, though I can imagine a future where we provide a WebAssembly.Memory
object that throws when you access the buffer
property. It would be unusable from JavaScript, but could be passed exported and passed to other wasm modules. With the host-bindings proposal, it could export objects that are converted to JS types without having to expose the buffer to JavaScript.
@binji, fair point - especially in a multi-memory setting, or in a setting where JS is less dominant than now...