wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Record and Replay Support for Component Model

Open arjunr2 opened this issue 5 months ago • 5 comments

Brief

This PR is intended to support deterministic record and replay (RR) of Wasm components intrinsically in Wasmtime, and received an initial round of discussion in the Wasmtime bi-weekly meeting on 07/17

Motivation

RR is a very useful primitive for improving the debugging story of Wasm in Wasmtime. Bugs that are often encountered in modules during deployment can be deterministically reproduced. In particular, it provides the foundation for the following (to name a few):

  • Reverse-execution (a.k.a, time-travel) debugging
  • Offline static/dynamic analysis of prior module executions
  • Profiling of module/runtime components
  • Automatic extraction of differential unit-tests for system interfaces (e.g. WASI)
  • Interposition points for targeted fuzzing of system interfaces and/or modules

Scope

This initial PR provides the base primitives for recording and replay events. It supports RR at all import function boundaries and lowering rules for component types. The RR event infrastructure is intended to be easily extensible to new event types as new use-cases emerge.

Primary Goals

  • Enabling low overhead (memory, compute, and trace size) recording and high-performance replay,
  • Full determinism during replay that can run outside the embedder context.
  • An engine-agnostic trace recording format -- the goal is to purely capture guest-host boundary crossings (import calls and component model interactions) that can be reasonably interpreted by another component model compliant engine.

Non-Goals (Subject to Discussion)

  • A human readable trace format. This belong better in something like wit-bindgen, and/or as an independent tool over the low-level trace
  • Replay support for updated versions of a recorded module -- this requires a much more coordinated effort from the producers as well to make this practically useful.

Initial Performance Numbers

Some initial runs on compression libraries like zstd show a 4-5% overhead on recording logic, excluding the disk I/O. This seems reasonable at the moment and likely doesn't need further optimization unless there are explicit use-cases.

Minor Todo

The following (minor) additions will be made in the coming days prior to potential merging:

  • Encoding hashes of modules in the recorded trace for validation
  • Generic writers/readers for RecordBuffer and ReplayBuffer
  • Feature gating all of RR

Questions for Maintainers

  • Do we get wasip1 for free by recording/replay at this level?
  • What's typically the most idiomatic way to serialize anyhow:Errors?

arjunr2 avatar Jul 18 '25 23:07 arjunr2

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to complete this check list:

  • [ ] If you added a new Config method, you wrote extensive documentation for it.

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • [ ] If you added a new Config method, or modified an existing one, you ensured that this configuration is exercised by the fuzz targets.

    For example, if you expose a new strategy for allocating the next instance slot inside the pooling allocator, you should ensure that at least one of our fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this configuration option in wasmtime_fuzzing::Config (or one of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this configuration. See our docs on fuzzing for more details.

  • [ ] If you are enabling a configuration option by default, make sure that it has been fuzzed for at least two weeks before turning it on by default.


To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the .github/label-messager.json configuration file.

Learn more.

github-actions[bot] avatar Jul 19 '25 02:07 github-actions[bot]

We'll want at least @alexcrichton to give this a review as well, maybe after we try some of the refactors mentioned here.

Happy to help out! (and agreed I'd like to once-over at some point)

How about scheduling a call when y'all are ready with the 3 of us? That'd probably be best to draw attention to any various areas and for me to ask some questoins in a high-bandwidth way before going off to review on my own.

alexcrichton avatar Jul 25 '25 18:07 alexcrichton

How about scheduling a call when y'all are ready with the 3 of us? That'd probably be best to draw attention to any various areas and for me to ask some questoins in a high-bandwidth way before going off to review on my own.

That'd be great! FWIW, I'm on PTO next week and the week after; please feel free to talk with Arjun directly before then if you both want, or I can join after Aug 11...

cfallin avatar Jul 25 '25 22:07 cfallin

@alexcrichton @fitzgen I'll take a pass through the comments early next week, and perhaps we can find a time later next week that works

arjunr2 avatar Jul 26 '25 00:07 arjunr2

Sounds good! Feel free to ping me on Zulip when ready

alexcrichton avatar Jul 26 '25 03:07 alexcrichton