move icon indicating copy to clipboard operation
move copied to clipboard

[move-vm] Removed most RCs from runtime

Open tnowacki opened this issue 2 years ago • 5 comments

Motivation

  • RCs are no longer necessary due to Move's reference safety checks
  • Some RCs are still needed to update clean/dirty status or hold onto external memory safely
  • I feel this greatly simplifies the layout of values in value_impl.rs
  • It does however make write_ref unsafe, as it could lead to memory issues if the reference safety checks are wrong or not run
  • I do not have a reference heavy bench mark to test if this makes any perf difference. My guess is it won't, but might help with memory usage

Test Plan

  • ran tests
  • I'm a bit nervous that our tests might not be exhaustive enough though....

tnowacki avatar Mar 29 '22 23:03 tnowacki

Still need to take a closer look, but here are some early feedbacks regarding unsafe fn.

I think one thing to point out is that write_safe is probably not the only thing that's unsafe -- read_ref and more broadly, anything that dereferences a CheckedRef (a.k,a, raw pointers) are also unsafe. Consider this case:

let locals = Locals::new(4);

locals.store_loc(0, Value::u64(0));
locals.store_loc(1, Value::u64(0));

let r1 = locals.borrow_loc(0).unwrap();
let r2 = locals.borrow_loc(1).unwrap();

std::mem::drop(locals);

r1.equals(r2);

I'm aware this is a terrible way to use the value APIs, but this is exactly the point of marking them as unsafe -- forcing the Move VM to annotate its usages, making it easier for us to review them and make sure they are indeed doing sane things.

Very perceptive :) I saw exactly this after working on #175, seems you just saw it a bit earlier than myself

tnowacki avatar Apr 01 '22 20:04 tnowacki

Not sure why the hardhat-tests are failing... Could you try to rebase?

vgao1996 avatar Apr 08 '22 17:04 vgao1996

I think this is kind of bummer for usage of the VM. We want to use references in adapters (doesn't Sui does so) and custom embeddings of the VM, as well as native functions implementations, avoiding the need for using unsafe all over the code. Unsafe Rust should be isolated. Can you wrap the unsafe usage somewhere inside the VM?

Anything with references in this implementation will be inherently unsafe. Even reading from them could be unsafe, if an improper write occurs. So there isn't really way to encapsulate the logic, which we already do for adapters, could do some endpoints, and can't really do for native functions:

On native functions: And there really isn't anything that can be done from the native function side of things, since native functions get references as inputs (and the native function implementation is not bound by Move's safety rules, the implementation could be wrong). Native functions already could have messed this up, but it would have been a graceful invariant violation. Now it will be some deeply disturbing, silent pointer failure. Which is potentially an issue. However, to me this doesn't feel like an issue in the sense that native functions are stating "I am extending the VM runtime", and as such you need to be just as safe as the VM runtime.

On the adapter: None of the safety concerns are exposed to the adapter. Value is only used inside of the runtime. The adapter talks to the VM with BCS serialized values, both for inputs and outputs. So this "wrapping" is already done indirectly by the usage of serialized values.

For other endpoints (testing infra and such): Its a bit annoying, and I could try to split the Value enum into reference and non reference. It would bloat the API but it would clean up at least a few endpoints... maybe...

tnowacki avatar Apr 11 '22 22:04 tnowacki

I think this is kind of bummer for usage of the VM. We want to use references in adapters (doesn't Sui does so) and custom embeddings of the VM, as well as native functions implementations, avoiding the need for using unsafe all over the code. Unsafe Rust should be isolated. Can you wrap the unsafe usage somewhere inside the VM?

Anything with references in this implementation will be inherently unsafe. Even reading from them could be unsafe, if an improper write occurs. So there isn't really way to encapsulate the logic, which we already do for adapters, could do some endpoints, and can't really do for native functions:

I wonder whether we differentiate here between unsafeness regards the Rust and the Move language? Accessing a dangling Move reference may result in an undefined value, but it should not result in a process crash. In contrast, wherever you use unsafe Rust, a process crash can happen because of programmer mistakes. It is acceptable to do this in some isolated internal contexts, but not Rustonian to require unsafe in public APIs.

Perhaps we need a safe wrapper around the Value implementation? If it costs a bit extra, I think this is acceptable. We can also keep the unsafe API open for the implementation of certain critical native functions, but one should not be forced into using it.

wrwg avatar Apr 12 '22 04:04 wrwg

Just a further note, not actionable but related to this PR.

If we want to have a more efficient implementation of values, I think we could go a different approach. What I sketch is motivated both by implementing the Move memory model on top of the EVM and for the Prover.

(a) Linear Memory

Borrow semantics is a model for linear addressable RAM, including pointer indirection and address arithmetic, and I think will be most efficiently implemented by reflecting those aspects. So why not represent any value as a byte-addressable region of memory. You can borrow offsets in this region, read values, and mutate them. An unsafe, untyped value representation would then be Value(*[u8]).

The lifetime of the region the value points to is guaranteed to be valid by our own implementation of memory allocation. We can also skip the GC (as Solidity does) assuming that transactions are short running. I.e. we use arena style allocation for *[u8]. The stack can still be shrinking and growing, but once allocated stack memory will stay in the VMs possession. In any case we can assume that *[u8] is always Rust-safe but possibly Move-unsafe, as we never give memory back to Rust.

As to how we layout memory, this was investigated as part of EVM Move. Assuming a field in a struct has a fixed offset, field borrowing simply amounts to Value(r) => Value(r[offs..offs+size]). Similar, borrowing a local amounts to slicing the stack frame, For globals we can give them their own region.

Nested structs can be inlined, or represented via indirections like vectors, which inherently need indirections because they have no fixed size.

(b) Generics and Variants

We can avoid enums for different value variants all together. A Value(*[u8]) can be only interpreted by providing a type. This type is either statically known to the program (no generics), or passed as a parameter to a function (with generics).

Since *[u8] always points to valid Rust memory and has a given size, we cannot destroy Rust integrity by using the wrong type when interpreting a value, including when mutating it.

There is one exception regards integrity, that is when we have an indirection to a vector (or struct if we choose so) embedded in *[u8]. In this case we must create a new Value from the data inside the region, which might be corrupted if we have the wrong type to interpret it. There are multiple ways how to deal with this:

  • Trust bytecode verification for internal VM code, and assume type soundness. For the external API, use ExtValue(TypeTag, Value)
  • Validate the indirection to be well-defined. As we know all memory we ever allocated, we can check whether the Value(*[u8]) constructed from the content of some other value actually points to a valid region. Valid means in the Rust sense, it can be still invalid in the Move sense.

wrwg avatar Apr 12 '22 05:04 wrwg