components-contrib icon indicating copy to clipboard operation
components-contrib copied to clipboard

middleware: changes wasm basic to use waPC

Open codefromthecrypt opened this issue 3 years ago • 32 comments

Description

Updates wasm basic middleware and simplifies some code inside of it, notably by using waPC.

This changes the entrypoint function to be named "rewrite" and uses a pool because known memory allocators are not goroutine safe.

FYI wazero's first beta tag will be in two months https://github.com/tetratelabs/wazero/issues/506#issuecomment-1162489846

Issue reference

N/A

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • [x] Code compiles correctly
  • [x] Created/updated tests
  • [n/a] Extended the documentation / Created issue in the https://github.com/dapr/docs/ repo: dapr/docs#[issue number]

codefromthecrypt avatar Jun 30 '22 14:06 codefromthecrypt

@daixiang0 can you review this since you know more about WASM and the Wazero library?

Also we want more information on the structure of the WASM that can be loaded. A compiled WASM sample is obviously not good enough :)

berndverst avatar Jun 30 '22 16:06 berndverst

@codefromthecrypt thanks for your contribution! Since wazero is not stable now, it will continue to iterate.

Maybe we can get UUID from context. For the logging, I would like to make it detailed to help debug.

Adding more support is my plan, such as support handling HEADER/response changes, while not having much time to do it now.

daixiang0 avatar Jul 01 '22 03:07 daixiang0

@daixiang0 cool I can do a UUID like https://github.com/wapc/wapc-go/blob/master/engines/wazero/wazero.go#L303

Should we start by assuming the wasm binary changes per request? or should we assume it doesn't?

codefromthecrypt avatar Jul 01 '22 05:07 codefromthecrypt

That is an interesting question, load once would get better performance. If wazero supports it, we can use it, and compare SHA256 of the wasm binary, if changes, we reload it.

daixiang0 avatar Jul 01 '22 05:07 daixiang0

OK I've refactored now, though feels there are some API glitches. Have a look at the impl and if there's any more stuff here I can do that before backfilling tests.

  • GetHandler isn't context aware, which can make things like tracing or profiling difficult (initialization logic)
  • GetHandler is a func, not an interface, which means what's returned can't be closed. This means some leaks
    • One way is to make Middleware closeable and then move the runtime to a field in that.

codefromthecrypt avatar Jul 02 '22 01:07 codefromthecrypt

At least a README should be made in how to generate the wasm

one other thing I think would be helpful is that the "run" function isn't documented or linked in a README or something (at least not that I can find). I think the source that "hello.wasm" was compiled from should be checked in, with instructions on how it was compiled. (ex. tinygo build -o hello.wasm -scheduler=none --no-debug -target=wasi hello.go.

Alternative ABI

Right now, "run" is allocating memory for the URI using tinygo specific allocation. So, ex this won't work with other language compilers. Another way is to not do allocation externally.

WASI-only

Instead of passing memory in and out, you might consider using WASI instead. Doing so would be more portable as you can use other languages.

For example, you can set moduleConfig.args = "rewrite_request_uri", uri and moduleConfig.stdout = buf

then just instantiate the module and see if the stdout is empty or not, and if not assign it to SetRequestURIBytes

WASI+callback

reading stdout is easy, but could result in larger code depending on how people write to it. Another way is to have them call a custom host function if they need to change the URI. So, in this case a host module is instantiated per request which dispatches "set_request_uri(offset, len)" to req.SetRequestURIBytes. The wasm people generate import this function and use that instead of stdout.

Callback only

instead of having the wasm parse os.Args (ex that "rewrite_request_uri" == arg0), the entrypoint function could be like you have now.. "run" aka "rewrite_request_uri". It wouldn't accept any args, as it is hard to know which memory offsets are unused by the compiler. Instead, there's a host function to get the current uri (ex "get_request_uri"). So on each request "rewrite_request_uri" is invoked which might call "get_request_uri" if it was enabled, and then might call "set_request_uri" if it decided to rewrite it.

Your choice here

There's no perfect way, and also there are other tools that have callback flows like wapc. This is mainly to get the thinking down. Even the current approach is fine, if documented. Just it needs tinygo only for wasm generation. That may be fine!

codefromthecrypt avatar Jul 02 '22 01:07 codefromthecrypt

Very cool! The basic WASM support is the first step, actually, I am thinking about Envoy WASM ABI implementation, that will create a bridge between service mesh and application runtime. For basic design, I think it is simple enough to start with.

Could you add an example in dapr/examples for this and update related documents?

daixiang0 avatar Jul 04 '22 06:07 daixiang0

cool. I'll polish this up where basic means URI rewrite. I'll go ahead and update the exported function name to "rewrite_request_uri", also, unless you've strong feelings towards "run"

codefromthecrypt avatar Jul 04 '22 23:07 codefromthecrypt

That is good, go ahead!

daixiang0 avatar Jul 05 '22 02:07 daixiang0

note I hit an unrelated code land mine but back together and will pick this up tomorrow.

codefromthecrypt avatar Jul 06 '22 09:07 codefromthecrypt

@daixiang0 I've updated the code here.

Could you add an example in dapr/examples for this and update related documents?

sorry if I'm a bit daft, but can you be precise where you want to update including both the repo and the directory names?

codefromthecrypt avatar Jul 07 '22 08:07 codefromthecrypt

sorry if I'm a bit daft, but can you be precise where you want to update including both the repo and the directory names?

I mean https://github.com/dapr/samples, you can add a WASM sample.

daixiang0 avatar Jul 12 '22 08:07 daixiang0

@daixiang0 cool for the sample. is there any thing else to do in this repo? I would like to have this land before updating https://github.com/dapr/samples so that the instructions can say the right function name.

codefromthecrypt avatar Jul 12 '22 08:07 codefromthecrypt

ok ready on my side, when this is merged I'll follow up on the sample no prob!

codefromthecrypt avatar Jul 14 '22 10:07 codefromthecrypt

@daixiang0 feedback sounds good. I generally make a sub-module readme instead of making a large one for the whole directory. I'll give it a go and if we need another revision no problem.

codefromthecrypt avatar Jul 14 '22 23:07 codefromthecrypt

Also please fix CI.

daixiang0 avatar Jul 15 '22 01:07 daixiang0

ok I've fixed everything. One thing I would like to say is that while we should merge this I think we should refactor before recutting the documentation.

Generally, I think it is too difficult to use this as generating the wasm is burdensome. People have to know too much for example memory offsets and such. I would recommend instead we either switch to a WASI approach as mentioned here or switch to wapc-go cc @pkedy. Either one I'm happy to do as a follow-up.

I also think that it is unlikely someone will be able to use something besides wazero, so allowing that configuration while seemingly open has limited usefulness. For example, if you add wasmtime or wasmer, you'd end up in the same situation of wapc-go where they cannot load at the same time due to shared library conflicts.

In summary, I think if the first goal is to show someone something easy, I think we should switch this to wapc-go+wazero and make it easy that way. This would also support languages besides tinygo. For future work, such as proxy-host, going straight to wazero makes more sense as it would be more efficient and the wasm side can leverage existing compilers. However, I'm fine to continue this course however you decide!

codefromthecrypt avatar Jul 15 '22 05:07 codefromthecrypt

WASM can be considered as a single field although most program things can be brought from other languanges, like Golang/Rust, it still needs some extra knowledge.

In the first step, I introduce its support, which is really a very initial version needed to iterate. Replace request URI is one function, let's add more functions if you want.

Since now we do not want to enable CGO, the runtime config may be useless, but after CGO support, we can quickly add other WASM runtimes which need CGO.

wapc-go is really cool, you can optimize with it.

daixiang0 avatar Jul 15 '22 05:07 daixiang0

@daixiang0 cool so I will optimize for wapc-go and note that to use other runtimes is a build flag not a runtime one. wapc-go works for that (I implemented it!) I think this is a good path because it makes the first time experience less painful. I will add a commit to do this with wapc so we can validate.

codefromthecrypt avatar Jul 15 '22 05:07 codefromthecrypt

ok I switched to wapc-go and i think you'll agree it is a lot simpler now, both in the host and also how users implement the guest side.

codefromthecrypt avatar Jul 16 '22 00:07 codefromthecrypt

Since now we do not want to enable CGO, the runtime config may be useless, but after CGO support, we can quickly add other WASM runtimes which need CGO.

Concretely, WASM runtimes that use CGO (wasmer, wasmtime) import the same WASI C header file. This causes a conflict which prevents them from being compiled in the same binary. This wouldn't be the case if they used different namespace, but it is also unlikely one will change. So, later, keep this in mind as basically dapr would need to be rebuilt for each runtime that conflicts like this. Some may be fine doing that as they are using custom binaries anyway.

codefromthecrypt avatar Jul 16 '22 01:07 codefromthecrypt

This shouldn't be a problem. We will not have CGO enabled in Dapr "ever". What we'll do is enable adding other components that communicate via gRPC. Because those will be in different processes, they can use CGO freely and don't risk conflicts.

ItalyPaleAle avatar Jul 16 '22 01:07 ItalyPaleAle

thanks for the steering @ItalyPaleAle

FYI and more for the others on this thread... I bumped https://github.com/dapr/dapr/issues/3787#issuecomment-1186061176 with some thoughts as I think exactly this plugin could help people learn a bit about WebAssembly as a potential plugin model alongside gRPC (at least for things that can be handled locally). Agreed that gRPC for plugin arch is fairly common and well understood means to control dependencies, including build or platform libraries.

codefromthecrypt avatar Jul 16 '22 02:07 codefromthecrypt

here's the doc PR https://github.com/dapr/docs/pull/2663

codefromthecrypt avatar Jul 18 '22 03:07 codefromthecrypt

@Taction @ItalyPaleAle please help review and merge.

daixiang0 avatar Jul 22 '22 00:07 daixiang0

@codefromthecrypt Thanks for your great work! LGTM overall, just a nit change.

Taction avatar Jul 22 '22 03:07 Taction

hi there. I'm wondering if there are any more steps needed. I'm attending a couple devops events (wellington and singapore) over the next couple weeks, and was hoping to point to this as a neat way to start learning about webassembly and extension. Any idea when this and the docs might be mergeable cc @msfussell

codefromthecrypt avatar Aug 01 '22 12:08 codefromthecrypt

@codefromthecrypt Also please resolve the conflicts.

Taction avatar Aug 05 '22 06:08 Taction

there's an API change under play so I'll update this again post merge of https://github.com/wapc/wapc-go/pull/35

codefromthecrypt avatar Aug 06 '22 04:08 codefromthecrypt

Codecov Report

Merging #1833 (850d6eb) into master (9c9df2f) will increase coverage by 0.01%. The diff coverage is 65.90%.

@@            Coverage Diff             @@
##           master    #1833      +/-   ##
==========================================
+ Coverage   37.82%   37.83%   +0.01%     
==========================================
  Files         192      192              
  Lines       24094    24103       +9     
==========================================
+ Hits         9114     9120       +6     
- Misses      14209    14216       +7     
+ Partials      771      767       -4     
Impacted Files Coverage Δ
middleware/http/wasm/basic/basic.go 66.01% <65.90%> (+9.63%) :arrow_up:
state/in-memory/in_memory.go 42.58% <0.00%> (-3.43%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Aug 06 '22 09:08 codecov[bot]