wasm-c-api icon indicating copy to clipboard operation
wasm-c-api copied to clipboard

Expose bounds-checked memory access API

Open AndrewScheidecker opened this issue 5 years ago • 6 comments

The API currently exposes the contents of a memory as:

byte_t* wasm_memory_data(wasm_memory_t*);
size_t wasm_memory_data_size(const wasm_memory_t*);

This effectively delegates the bounds checking to the API client. The problem is that this doesn't expose a race-free way to bounds check a memory access in the case of shared memories, and even in the single-threaded case can be hard for the client to get right.

An API that would make it easy for API clients to do the right thing is:

bool wasm_memory_lock_range(wasm_memory_t*, size_t offset, size_t size, byte_t** out_range_base);
void wasm_memory_unlock_range(wasm_memory_t*, size_t offset, size_t size);

This shouldn't involve any actual locking as long as the VM is single-threaded, or knows the memory can't shrink, unmap/protect pages, etc. If the VM is multi-threaded and supports shrinking/unmapping/protecting pages, then I imagine you'd use a rwlock to serialize these range-locks with those operations that mutate the bounds-checking state.

AndrewScheidecker avatar Jun 21 '19 12:06 AndrewScheidecker

Note that the API does not yet support the threads proposal (e.g., no shared memories are expressible yet). So far I had assumed that staying within bounds is the responsibility of the host, and that OOB access is simply UB. This is C after all. Not sure what the use case is for race-free bounds checks.

I don't have a strong opinion regarding your proposal, though I don't quite see how it could be implemented in general, and I suspect that it may be rather difficult, if not impossible, on many architectures. Any such feature will certainly require broader discussion among engine implementers.

rossberg avatar Jun 21 '19 13:06 rossberg

So far I had assumed that staying within bounds is the responsibility of the host, and that OOB access is simply UB. This is C after all.

In this case, UB is likely a sandboxing violation.

Even if your bounds-checking is UB free, you also have to worry about leaking information via speculative cache filling and other micro-architectural state.

Combine the difficulty of getting it right with the number of opportunities a typical embedder will have to screw it up, and I think it's worth providing a safer API.

Not sure what the use case is for race-free bounds checks.

This only applies if you have both shared memory and the ability to shrink a memory or unmap subregions of it: in that case you'd have to worry about the page you're trying to access being shrunk/unmapped between when you do your bounds check and the actual memory access.

I don't have a strong opinion regarding your proposal, though I don't quite see how it could be implemented in general, and I suspect that it may be rather difficult, if not impossible, on many architectures.

Can you elaborate on this?

AndrewScheidecker avatar Jun 21 '19 17:06 AndrewScheidecker

Keep in mind that this is an embedder API, not an API that should be made accessible to untrusted client code. The embedder has to be part of TCB. Since this is C, it can pretty much pierce every abstraction anyway, and it's outright impossible to maintain any sandboxing guarantees on that level.

As for the implementation, the embedder is given a raw C pointer into memory. How would you implement range checks on external dereferences?

rossberg avatar Jun 21 '19 18:06 rossberg

Keep in mind that this is an embedder API, not an API that should be made accessible to untrusted client code. The embedder has to be part of TCB. Since this is C, it can pretty much pierce every abstraction anyway, and it's outright impossible to maintain any sandboxing guarantees on that level.

Yes, the goal is not to protect against an untrusted embedder, but to make it easier for a trusted embedder to correctly bounds check accesses it makes to untrusted linear memory addresses.

As for the implementation, the embedder is given a raw C pointer into memory. How would you implement range checks on external dereferences?

The API would just guarantee that any accesses within a locked range are in bounds for the duration of the lock. The embedder would be responsible for ensuring it doesn't access memory outside of the range passed to wasm_memory_lock_range.

AndrewScheidecker avatar Jun 21 '19 18:06 AndrewScheidecker

I don't see how this API makes anything inherently less racy or dangerous. First, wasm memories cannot be shrunk, so once an index is in-bounds, it is in-bounds for as long as the instance lives. As for growing memories, wasm engines that support shared memory cannot really support growing by alloc+copy because it is too racy. (This is why shared memories must have a maximum size, so that engines can pre-allocate.)

titzer avatar Jun 23 '19 00:06 titzer

First, wasm memories cannot be shrunk, so once an index is in-bounds, it is in-bounds for as long as the instance lives.

Yes, the API doesn't support shared memories, much less shrinking them or unmapping/protecting pages in them. But we should consider the possibility of adding those features when designing the rest of the API.

To be clear, I'm not arguing for removal of wasm_memory_data, but that we should also provide some way to access memory with bounds checking.

AndrewScheidecker avatar Jun 23 '19 13:06 AndrewScheidecker