bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Support for executing WASM modules in repository_ctx

Open jmillikin opened this issue 1 year ago • 1 comments

Description of the feature request:

I would like to be able to execute a WASM module from a repository_rule implementation function, as an alternative to native binaries.

The API would look something like this:

repository_ctx.execute_wasm(path, input, timeout=600, entry_point=None)
  path: `string`; or `Label`; or `path`; required
      Path to a `.wasm` module to execute.
  input: `string`; required
      Input to provide to the module (not interpreted by Bazel).
  timeout: `int`
      Maximum duration of the command in seconds.
  entry_point: `string` or `None`
      If set, invoke the named export instead of the default entry point.

  return: `wasm_exec_result`
      field `output`: `string`
          Output from the module (not interpreted by Bazel).
      field `return_code`: `int`
          The return code returned after the execution of the module. 256 if execution
          was terminated by a timeout.

The repository rule implementation function is responsible for assembling an input string and parsing the output string according to its own needs -- for example, it might use JSON + the json module for structured input/output. The WASM module itself has no access to repository_ctx functionality.

From the WASM side, the API looks like this:

func example_entry_point(
    input_ptr: *uint8,
    input_len: uint32,
    output_ptr: **uint8,
    output_len: *uint32,
) -> uint8 /* return_code */

Which category does this issue belong to?

External Dependency

What underlying problem are you trying to solve with this feature?

Generating BUILD and .bzl files within a repository rule improves the user experience when adapting third-party code to build with Bazel. When the generation logic is too complex to write in Starlark, rulesets often rely on a helper binary for language-specific logic (e.g. enumerating imports).

  • These helper binaries must either be pre-compiled for specific platforms, or compiled as part of the repository rule. Either approach has significant limitations for portability and hermeticity.
  • Repository rules executing locally aren't sandboxed, having full access to the filesystem and network (unless Bazel itself is sandboxed).

Embedding a WASM interpreter into Bazel and letting it execute WASM modules as part of repository rules can enable a different approach, where pre-compiled .wasm modules ship with the rules and can be executed on any platform that can run Bazel itself.

Keeping the API very small (string input, string output) keeps the maintenance burden on Bazel itself to a minimum, with no need to worry about things like WASI or how to do WASM <-> JVM FFI.

Which operating system are you running Bazel on?

Linux (x86-64), macOS (aarch64)

What is the output of bazel info release?

release 7.1.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

jmillikin avatar May 22 '24 05:05 jmillikin

I've built a proof-of-concept for this feature using Chicory, a WASM runtime written in Java. It isn't large (maybe ~500 LoC excluding tests), and it's able to run modules written in Go (via TinyGo) and Rust. Chicory itself is small (~161 kB) and has no additional dependencies.

Would the Bazel devs be interested in reviewing an implementation PR?

Demo repository rule:

def _wasm_demo(ctx):
    # Would be obtained from ctx.download_and_extract() normally
    ctx.file("example/src.go", "package main\n")

    # Inputs are assembled by Bazel, using path.readdir() / ctx.read(), etc
    srcs = {"example/src.go": ctx.read("example/src.go")}

    result = ctx.execute_wasm(
        path = ctx.attr._demo_wasm,
        function = "wasm_repo_demo",
        input = json.encode({"srcs": srcs}),
    )

    # Output contains instructions to the rule as to which changes
    # to apply, for example writing/patching BUILD files.
    output = json.decode(result.output)
    print("output: %s" % (repr(output),))
    ctx.file("BUILD.bazel", "")
    ctx.file("WORKSPACE.bazel", "")

wasm_demo = repository_rule(
    _wasm_demo,
    attrs = {
        "_demo_wasm": attr.label(
            default = "@//:demo.wasm",
            allow_single_file = True,
        )
    }
)

jmillikin avatar Jun 30 '24 12:06 jmillikin

Could this be realized by a third-party Starlark library that either 1) downloads the Chicory JAR and executes it with Bazel's embedded JDK (slight hack) or 2) downloads a Graal Native Image of Chicory and then runs it? That would make it possible to update Chicory separately from Bazel releases.

fmeum avatar Jun 30 '24 19:06 fmeum

  1. downloads a Graal Native Image of Chicory and then runs it?

The advantage of Chicory is that it runs in a JVM. If the interpreter is to be distributed as a native executable then WAMR would be a better approach. And downloading a native WASM interpreter wouldn't provide much benefit over simply downloading natively-compiled versions of a repository generator tool, with similar downsides with regards to portability.

  1. downloads the Chicory JAR and executes it with Bazel's embedded JDK

I'm not sure how this would work -- is there a way to access Bazel's Java runtime from within a repository rule? My understanding is that it gets bundled into the bazel executable and unpacked to a temporary directory, and I don't see a way to locate that directory from a repository_ctx or module_ctx.

Using rules_java~~toolchains~remotejdk* to run a .jar would work, but downloading a JRE is the same as downloading a WASM interpreter.

That would make it possible to update Chicory separately from Bazel releases.

Is that an important goal? According to Chicory's roadmap to v1.0, the functionality that Bazel would use (an interpreter that implements the WASM v1.0 spec) is complete.

The functionality yet to be implemented is less necessary for the Bazel use case:

  • Validation isn't as important, since the .wasm file comes from the ruleset. Bazel rulesets are commonly trusted to download and execute executables, so even partially-implemented validation is better than current state.
  • SIMD doesn't have much value for BUILD file generators.
  • Extensions such as WASI, GC, and Threads are unnecessary for BUILD file generators.
  • Ahead-of-time compilation of WASM to JVM bytecode is unnecessary (and probably unwanted).

jmillikin avatar Jun 30 '24 23:06 jmillikin

And downloading a native WASM interpreter wouldn't provide much benefit over simply downloading natively-compiled versions of a repository generator tool, with similar downsides with regards to portability.

Yes, this is certainly less convenient, but I wonder how much: The interpreter would only need to be downloaded once and could then be used by arbitrarily many repo rules. OS/arch detection in repo rules isn't great, but it's ultimately using the same source of truth as Bazel itself (JVM system properties) and so shouldn't introduce additional portability concerns.

Thanks to Cosmopolitan, we could potentially even use a single binary across all platforms: https://github.com/wasm3/wasm3

The main advantage of a solution outside Bazel is that rulesets could immediately adopt it rather than waiting until, say, 7.3.0 is their minimum supported version of Bazel.

fmeum avatar Jul 03 '24 09:07 fmeum

I think if someone wanted to write a WASM interpreter binary for use by the Bazel rule ecosystem, they would have done so already. And the existence of native WASM support within Bazel would not prevent someone from doing so, should they feel inspired.

jmillikin avatar Jul 03 '24 15:07 jmillikin

Sorry for neglecting this previously, after learning more about WebAssembly and reading through the proposals, I think this is a very reasonable feature request.

But I feel we need a more detailed analysis of pros and cons of different options?

  • Using Chicory with Bazel embedded JVM
  • Download the WASM runtime with a rule
  • Embed platform dependent WASM runtime
  • Embed platform independent WASM runtime (wasm3)

@jmillikin Do you think it's worth creating a design doc for this? The repository API could also be different depending on which path we choose.

FYI @Wyverald @coeuvre @meisterT @lberki, WDYT?

meteorcloudy avatar Mar 07 '25 09:03 meteorcloudy

Sure, I'd be happy to put together a design doc. It looks like https://github.com/bazelbuild/proposals is still in use, so I'll start from its template.

Who should be on the reviewers: list for the design?

You might also be interested in my branch at https://github.com/jmillikin/upstream__bazel/commits/repo-rule-execute-wasm/, which has an example of using a WebAssembly helper to parse a TOML file.

jmillikin avatar Mar 07 '25 10:03 jmillikin

You can add me and all the people I mentioned in the previous comment.

meteorcloudy avatar Mar 07 '25 10:03 meteorcloudy

I'm not too hot about this. Reason being, we already have a scripting language inside Bazel called Starlark and embedding a WASM interpreter would give Bazel another extension point. If we think that Starlark does not suffice, we should either make Starlark good enough for the purpose or allow calling out to another language more easily, but I don't see how having two extension interfaces is better than one.

This would also raise interesting questions about running foreign code in the address space of Bazel. I'm not as much worried about security as I am about RAM/CPU use bloat. Currently, the rule "no user code runs in the address space of Bazel except Starlark" makes finding the culprits of regressions much simpler.

lberki avatar Mar 07 '25 12:03 lberki

Design proposal: https://github.com/bazelbuild/proposals/pull/402

jmillikin avatar Mar 12 '25 12:03 jmillikin

A fix for this issue has been included in Bazel 8.3.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=8.3.0rc1. Thanks!

iancha1992 avatar Jun 16 '25 21:06 iancha1992