sui icon indicating copy to clipboard operation
sui copied to clipboard

[sui tools] prototype of Sui surfer testing tool

Open sblackshear opened this issue 2 years ago • 4 comments

Implement a simple type-inhabitation-based bot for Sui for end-to-end testing and (possibly) semi-realistic transaction workload generation. Here's how it works:

  • Takes a list of package ID's and a wallet config as an input
  • Processes the modules in each package in dependency order (leaves of the dep graph first)
  • In a loop, attempt to call each declared entry function in a module, recording each successful (i.e., non-aborting) call and skipping this function in subsequent iterations. The loop ends when the surfer fails to succeed in calling a new function in this loop. The reasoning for this strategy is that many script functions have dependencies (e.g., DevNetNFT::update depends on DevNetNFT::mint) that the surfer is not smart enough to infer, but trying repeatedly uncovers some (though not all) of these deps.
  • In order to call each function, the surfer looks at each parameter type, generates a random value of that type (if it's a pure type), or looks for a value of the appropriate type in the wallet (if it's an object type). If it can't find a value of the given type, the attempt to call the function fails.

This is limited in countless ways, and there are lots of exciting directions to extend this (e.g., using static analysis to inhabit types by calling a function that we know will create a value of that type). But it's still reasonably effective as-is, so wanted to go ahead and put up a PR. I tested it by spinning up a local network (it currently doesn't work with devnet due to API compatibility issues) and asking it to "surf" the Sui Framework. It was able to call about 18% of the functions.

// start local network
cargo run --bin sui-surfer -- --packages 0x2
// lots of intermediate printing
SurferStats { explored_packages: 1, explored_modules: 21, explored_functions: 34, failed_functions: 188 }

sblackshear avatar Jun 02 '22 14:06 sblackshear

This looks like a great fuzz test entry point (cc @tharbert ) ! A couple of Qs:

  1. How do we interpret the result stats, especially the failed_functions? Do we try to minimize this number?
  2. Is this a tool that we expect move devs will use to test their packages? Or do we mainly use it for the 0x2 core modules?

longbowlu avatar Jun 03 '22 00:06 longbowlu

  1. How do we interpret the result stats, especially the failed_functions? Do we try to minimize this number?

These numbers only quantify the effectiveness of the surfer tool--"failed functions" means the number that it was unable to call, or was able to call, but the call resulted in an abort. Lower failed functions is always better, but reaching 0 may not be possible (e.g., because a silly Move programmer can write an un-callable function, or a function that always aborts).

  1. Is this a tool that we expect move devs will use to test their packages? Or do we mainly use it for the 0x2 core modules?

This is intended for end-to-end smoke testing of the Sui validator code, RPC code, etc. in a live or local network. In addition, for benchmarking or smoke testing we are often looking for workloads that are more "interesting" or "realistic" than just payments, and running the surfer for a bit can generate such a workload with no effort from the programmer.

I don't think this is the easiest or best way to test Move code. The unit test framework has features for mocking objects, sending transactions from any address, measuring code coverage, etc. that would be hard to recreate here, and having an actual network adds an unnecessary layer of indirection. We have some collaborators at McGill that are working on integrating fuzzing for Move into the unit testing framework, which will be great!

sblackshear avatar Jun 03 '22 03:06 sblackshear

@sblackshear

It was able to call about 18% of the functions.

How can we do better? I am interested... and ideas to get start?

seyyedaliayati avatar Mar 30 '23 18:03 seyyedaliayati