design icon indicating copy to clipboard operation
design copied to clipboard

Universal checkpoint functionality across Wasm VMs

Open MrHate opened this issue 4 years ago • 2 comments

As Wasm VMs manage most runtime status, is it feasible to capsule a minimal runtime status closure as a universal checkpoint across different Wasm VMs?

Motivation

  • Support for a user-level light-weight checkpoint functionality
  • Assist for more complicated checkpoint techs
  • Assist for Wasm instance migration across hybrid-cloud frameworks based on diverse Wasm VMs
  • Assist for fast initialization strategies like suspend-and-clone
  • Assist for function-grained live update strategies

Overview

Although various Wasm VMs feature various interpretation implementations, Wasm instances of the same Wasm module running on different Wasm VMs do have quite much in common, such as function code, import requirements, even the stack layout. We can consider a running Wasm instance as a combination of a maximal part able to be shared and a minimal part able to describe the running status extracted into checkpoints.

When to checkpoint

The checkpoint generation process should wait until the control flow returns to the main module (other than the imported) to ensure simplicity and feasibility, enabling the instances loaded from checkpoints to interact with the imported functions with the same interfaces yet different implementations. Thus the host VMs can keep their various library function implementations before, while the imported function validation might be a little more intricate.

Validation

We should also apply a validation process on loading a checkpoint and transfer the validation-related data within checkpoints. The checkpoint loading validation mainly focuses on whether the checkpoint is well-formed and matches the target Wasm module.

MrHate avatar Apr 12 '21 07:04 MrHate

Thanks for filing this issue, I'm wondering if you could dive into the motivation a little bit more? Examples of hybrid-cloud frameworks would be useful. Given the breadth of VMs, my concern would be that finding something universal that works across VMs would be hard.

dtig avatar Apr 29 '21 17:04 dtig

Since such a feature deeply depends on the implementation specifics of a particular engine, and could be very expensive to implement, I don't see how this would make sense to specify as part of the core Wasm spec.

Note that you may be able to implement something like this functionality today entirely in user space. You could use Emscripten's asyncify to have code injected that can unwind the stack and save its state (since the regular Wasm stack is not user-serializable), and the JS code could then serialize the linear memory (and potentially diff/compress it). You should then be able to rewind the stack upon reinstating a checkpointed memory. @kripken

aardappel avatar Apr 29 '21 18:04 aardappel