dejavu icon indicating copy to clipboard operation
dejavu copied to clipboard

Test the soundness of `vm::Value`

Open rpjohnst opened this issue 4 years ago • 2 comments

Edit: This has been addressed; this issue now tracks testing to ensure this doesn't get broken.

The VM stack contains three kinds of raw pointers, with relatively unclear rules around when and how they can be used soundly. This makes it hard to determine whether (modifications to) the interpreter are correct. Worse, it exposes the entirely-safe engine APIs to unsoundness- for example, this is the main blocker for implementing instance_create and instance_destroy.

These are the three kinds of pointers:

  • vm::Value can encode a vm::Array, which is itself an Rc<UnsafeCell>. However, because vm::Value is Copy, the Rc is not helpful at this level- it's there to implement copy-on-write arrays at the GML level.
  • vm::Row is an intermediate ptr::NonNull into a vm::Array. It is intended to be "short-lived" (i.e. not stored in instances) but we may still want Dejavu to be allowed to produce optimized bytecode such that it lives across calls to other scripts or even engine APIs.
  • A with iterator is a ptr::NonNull to some array of vm::Entitys owned by the vm::Thread or vm::World. It is "short-lived" in the same way as vm::Row, but unlike vm::Row is required by GML semantics to live across calls in the body of the with loop.

There are two separate-but-related problems we need to avoid:

  • The most obvious problem is that these pointers are not tracked by the borrow checker, making it easy to use one after its target is gone, on accident. This risk is mitigated somewhat for vm::Row and with iterators because they are confined to the VM stack, but it's still there.
  • The second problem is scarier and more subtle- the interpreter or the engine might violate Rust's aliasing rules and hit undefined behavior. This risk is also mitigated somewhat for raw pointers, but the code was not written with very much attention to this detail.

What I'd like to do to build confidence that this is sound:

  • [x] Figure out a way to resolve the fact that Value contains an Rc but is also Copy- maybe this is okay if we can enforce that there is always a corresponding live Rc on the VM stack, or maybe we need to drop the Copy impl. (#45)
  • [x] Fix the way with and instances work so it doesn't immediately explode if you e.g. instance_create in a with body. @Zekka suggested making the arrays copy-on-write, which solves the use-after-free issue; we just need to work out the specifics. (#46)
  • [x] Encode the above in the types exposed to the engine. We might need to expose the concept of a frame lifetime, add some reference counting, etc.
  • [x] Audit the whole thing and fix any remaining shenanigans with references and other kinds of aliasing.
  • [ ] Expand the test suite with this in mind and run it through Miri, ideally on CI. The current test suite passes, but it doesn't exercise much of this stuff.

rpjohnst avatar Apr 04 '20 17:04 rpjohnst

Regarding the first checkbox (Value contains an Rc but is also Copy) I believe my original intent was that Values outside of the VM stack were really more like RcBorrows- they don't increment the refcount, but they don't decrement it either. But to expand on what the checkbox says: this isn't quite enough, because they don't actually borrow from anything.

This suggests a way to drop the Copy impl without drastically increasing refcount traffic- most of what the interpreter and engine APIs do can be done without a fully owned Value, so they can deal in &Values and use clone when they need to keep it around longer.

On its own, this just replaces all that refcount traffic with a bunch of extra references instead. It may be better to introduce something like a ValueRef<'v> type with the same representation as the Value it borrows from, but Copy like the original implementation of Value.

rpjohnst avatar Apr 28 '20 00:04 rpjohnst

I've addressed the known issues here, so the remaining work is to make sure we have good test coverage. This might include running the test suite under Miri and other sanitizers, as well as fuzzing.

rpjohnst avatar Aug 08 '20 21:08 rpjohnst