libsql Allow registering and executing WebAssembly functions

Allow registering and executing WebAssembly functions

Open psarna opened this issue 1 year ago • 15 comments

This draft implements a mechanism for registering and running Wasm functions. The current runtime of choice is wasmtime and its libwasmtime.so library with C bindings (but a switch to Rust should be considered, because that's the native language of wasmtime and the only interface which offers all of its features).

It operates on a very crude ABI (ref:#16), where ints and doubles are passed to/from WebAssembly as is, and for strings/blobs/null it passes a pointer to a structure:

string: [1 byte for type specification][data]
blob: [1 byte for type specification][4 bytes of size][data]
null: [1 byte for type specification]

The way it's implemented now is twofold:

There's an internal run_wasm function, capable of running WebAssembly and translating the parameter types from and to the Wasm module
A dynamic lookup table, currently a regular SQL table: CREATE TABLE libsql_wasm_func_table(name text PRIMARY KEY, body text), which actually also needs to be created and filled manually at the time of this writing.

After creating and filling the new meta-table, when a function call is used in a statement, e.g. SELECT id, fib(id) FROM t, and function fib is neither built-in nor user-defined, it will be looked up in the table. If found, its body will be assumed to hold valid WebAssembly code, compiled and run.

In order to enable WebAssembly integration, run configure with ./configure --enable-wasm-runtime parameter.

A few example WebAssembly-based user-defined functions coded in Rust can be found here: https://github.com/psarna/libsql_bindgen

Here's an inline demo for testing purposes, with a WebAssembly fibonacci sequence already compiled from Rust and copied in-place:

CREATE TABLE IF NOT EXISTS libsql_wasm_func_table(name text PRIMARY KEY, body text);

INSERT INTO libsql_wasm_func_table (name, body) VALUES ('fib', '
(module 
 (type (;0;) (func (param i64) (result i64))) 
 (func $fib (type 0) (param i64) (result i64) 
 (local i64) 
 i64.const 0 
 local.set 1 
 block ;; label = @1 
 local.get 0 
 i64.const 2 
 i64.lt_u 
 br_if 0 (;@1;) 
 i64.const 0 
 local.set 1 
 loop ;; label = @2 
 local.get 0 
 i64.const -1 
 i64.add 
 call $fib 
 local.get 1 
 i64.add 
 local.set 1 
 local.get 0 
 i64.const -2 
 i64.add 
 local.tee 0 
 i64.const 1 
 i64.gt_u 
 br_if 0 (;@2;) 
 end 
 end 
 local.get 0 
 local.get 1 
 i64.add) 
 (memory (;0;) 16) 
 (global $__stack_pointer (mut i32) (i32.const 1048576)) 
 (global (;1;) i32 (i32.const 1048576)) 
 (global (;2;) i32 (i32.const 1048576)) 
 (export "memory" (memory 0)) 
 (export "fib" (func $fib)))
');

CREATE TABLE IF NOT EXISTS example(id int PRIMARY KEY);
INSERT OR REPLACE INTO example(id) VALUES (7);
INSERT OR REPLACE INTO example(id) VALUES (8);
INSERT OR REPLACE INTO example(id) VALUES (9);
SELECT id, fib(id) FROM example;

This series also comes with syntactic sugar for registering and deregistering Wasm functions dynamically via SQL: CREATE FUNCTION and DROP FUNCTION: Fixes #18

Fixes #17

Oct 13 '22 13:10 psarna

This is only a draft for multitude of reasons, the most important ones being:

[x] lack of automated tests
[x] currently, invoking Wasm-based user-defined functions causes an explicit memory leak during lookup - these functions need to be tracked and cached (also to avoid Wasm recompilation) - namely, once registered dynamically, the function should simply end up on the list of all the other user-defined functions

Oct 13 '22 13:10 psarna

Great work! Perhaps you could consider CNCF's WasmEdge, which has a well maintained C SDK with LLVM-based AOT support for embedding. :)

https://github.com/wasmedge/wasmedge

https://wasmedge.org/book/en/sdk/c.html

Disclaimer: I am a maintainer at WasmEdge. We helped Nebula Graph and TiDB to support similar Wasm UDFs in their SQL DBs.

Oct 13 '22 19:10 juntao

@juntao I actually looked it up earlier today, we're definitely interested in giving it a go! And, eventually, make the implementation runtime-agnostic by relying on Wasm C API (https://github.com/WebAssembly/wasm-c-api) that @losfair mentioned in another issue.

I remember from my morning research that the C dynamic library from WasmEdge release page was ~50MB, which is quite heavy compared to libwasmtime's 17 - are you aware of any thinner versions of it?

Oct 13 '22 20:10 psarna

Yes. I believe WasmEdge supports the standard C API -- I will confirm.

The WasmEdge dynamic library really should not be that big. The distribution binary of WasmEdge is only 8MB. I think the large version contains LLVM so that it can do AOT compilation w/o external dependency. Let me double check and revert. Thank you!

Oct 13 '22 21:10 juntao

I remember from my morning research that the C dynamic library from WasmEdge release page was ~50MB, which is quite heavy compared to libwasmtime's 17 - are you aware of any thinner versions of it?

Hi, The official release contains the ahead-of-time compilation (with LLVM inside). So it may take more space. However, if you are looking for a tiny version, we have wasmedge/slim-runtime 1, which is the runtime only without the compiler inside.

$ file libwasmedge.so.0.0.0
libwasmedge.so.0.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=24d0e767d4f29f65dedc1c334e72e2320b6d391c, stripped
$ size libwasmedge.so.0.0.0
   text	   data	    bss	    dec	    hex	filename
1806540	  45848	    904	1853292	 1c476c	libwasmedge.so.0.0.0
~= 1.8M

Oct 14 '22 07:10 hydai

Thank you @hydai

@psarna I think the 1.8MB WasmEdge runtime library is sufficient for your use case. Developers can compile their functions to regular Wasm in any tool they choose. They can further use the wasmedgec tool to do AOT compiling before submitting the Wasm file to the database engine. The 1.8MB runtime library can handle both cases.

Ref: https://wasmedge.org/book/en/quick_start/run_in_aot_mode.html

Also, we do not yet support the proposed "standard" C API. But it could be supported if there is user demand. :)

Oct 14 '22 08:10 juntao

Splendid, thanks guys! 1.8MiB sounds way more aligned with edge use cases indeed, will give it a try

Oct 14 '22 08:10 psarna

v2:

.wat source code is now precompiled to a wasm module during initialization, once
each valid wasm function is now dynamically registered as a user-defined function, so it's not recompiled on consecutive exeuctions

Still to do: automated tests

Oct 17 '22 10:10 psarna

v3:

multiple fixes
added CREATE FUNCTION statement
added DROP FUNCTION statement

TODO:

[x] automated tests
[x] docs

Oct 19 '22 14:10 psarna

Great works. We are working on using wasm to supply UDF functions in openGauss database too. In our work, we supply an init function to load the wasm mode from local .wasm or .wat file, and parse the file to get the exported function informations. Then users can get the exported function signature intuitively through the tables we supplied.

We also made a demo to run wasm code in openGuass database, supplied with a docker image to experience. You can find the project and docker image info below. https://github.com/Nelson-He/openGauss-wasm

Hope we will keep in touch and exchange the thoughts further more.

Oct 20 '22 02:10 Nelson-He

@penberg please don't merge just yet as it needs more quality Rust tests, covering all types and so on, but I'm marking this PR as ready for review, because the main code is.

Oct 20 '22 16:10 psarna

@penberg I added a few more tests, I'm yet to produce docs (mentioning that the thing is experimental), but code review is welcome!

Oct 25 '22 19:10 psarna

self note: there's a spontaneous double free caused by dropping a wasm function, will investigate tomorrow

Oct 25 '22 19:10 psarna

(false alarm, the bug was in the tests)

Oct 25 '22 19:10 psarna

I added an initial doc explaining Wasm functions and configuration

Oct 26 '22 14:10 psarna

todo based on manual tests: CREATE FUNCTION should accept not only strings, just expressions - that would allow using readfile for uploading large .wat files

Nov 01 '22 10:11 psarna

Another todo: we should seriously consider switching to storing raw .wasm binary blobs exclusively, or at least add such support next to .wat files. Wat files are human readable and more verbose, but also more bloated, and require compilation. .wasm on the other hand would allow us to use considerably lighter runtimes (e.g. WasmEdge in slim mode) and would not require a compilation step

Nov 02 '22 13:11 psarna

v4:

blobs are supported

Nov 03 '22 14:11 psarna

@penberg @glommer I would appreciate external input on a decision:

Right now, this PR abuses the modularity of libSQL a little, because functions are registered at parsing stage, not later, when the virtual machine executes the statement. Ideally, the only effect of the parsing stage should be vdbe opcodes, to be executed by vdbe later. Implementing it the proper way would take more time and be more intrusive (a new opcode, incompatible with sqlite, would be introduced), but, well, it will be more proper. Do you think I should go ahead and add it to this PR, or move it to a follow-up? Backward compatibility will be preserved for users anyway, but if we prefer this PR to be self-sufficient and solid, it's better if I spend more time on it and add a new opcode on top.

Nov 03 '22 14:11 psarna

edit: I think I don't need external input after all, without doing it properly things like EXPLAIN statement get broken, so it's not really acceptable. I'll provide v5 with proper new opcodes as soon as possible

Nov 03 '22 15:11 psarna

v5:

opcodes for creating and dropping functions are added
EXPLAIN now works, verified with a Rust test

Nov 03 '22 20:11 psarna

@penberg gentle review ping

Nov 04 '22 19:11 psarna

While this PR is in review, I'll take a shot at trying to move the Wasm parts of the implementation straight to Rust, thus dropping the libwasmtime dependency. It would be much better developer experience I believe, with no manual steps for downloading dependencies

Nov 07 '22 10:11 psarna

I also plan to make the Rust layer as runtime-agnostic as possible, so hopefully we could easily provide WasmEdge and Wasmer integration later as well

Nov 07 '22 10:11 psarna

@psarna Sorry for asking this so late in the development cycle, but could we refactor this so that the core code only has hooks for a Wasm runtime and move the wasmtime stuff in ext/wasm, for example? The reasoning is that the application that embeds libSQL could already have a Wasm runtime (for example, the ChiselStrike runtime already has V8), and it would be great to be able to hook into that. What do you think of this type of design approach?

Nov 07 '22 11:11 penberg

@penberg actually, that's kind of what I'm doing right now - moving the actual implementation of all the bindings to Rust - and then everyone can add their own bindings. ext/wasm is already taken for the other way round, compiling libsql to WebAssembly :innocent: but I'll pick another directory name

Nov 07 '22 11:11 psarna

@penberg I did the following v4 change: instead of using wasmtime interface directly, as of now there exists a vendor-agnostic ext/udf/wasm_bindings.h header, and a ext/udf/wasmtime_bindings.c implementation. The vendor-agnostic header is based on wasmtime's C interface on a 1:1 basis for now, as it's the only working backend for now anyway, and it can be subject to change later. With that, integration with V8 can be done as follows:

Produce a ext/udf/v8_bindings.c file implementing the interface defined in ext/udf/wasm_bindings.h
Add it to the build process by editing thousands of Makefile.in, configure.ac and other autoconf files

Here's the commit that introduces the change: https://github.com/libsql/libsql/pull/45/commits/f7890509006497d610a771c6f2ab9d437741dc89

Opinions?

Nov 07 '22 13:11 psarna

Oh and my idea for later is to build a Rust-based implementation of the ext/udf/wasm_bindings.h header, and then we can drop the libwasmtime dependency, because we'll be building a version of it ourselves. That would also be a great opportunity to validate if the interface from ext/udf/wasm_bindings.h is convenient enough

Nov 07 '22 13:11 psarna

@penberg scratch that, it's not the best approach. Tomorrow I'll make wasm_bindings.h considerably smaller, consisting only of functions to instantiate a new wasm function and run it. Then, one of the implementations of this interface will be the code we have now, and v8 can have its own specialized impl.

Nov 07 '22 19:11 psarna

@penberg v5: there, that's more like it, the implementation now simply needs 2 functions: https://github.com/libsql/libsql/pull/45/commits/aed9bd8f36ce335507785529df68917c7dd5015b , that should be easy enough to port to v8 and native Rust implementation of the Wasmtime backend.

Nov 07 '22 20:11 psarna

libsql libsql copied to clipboard

Allow registering and executing WebAssembly functions

libsql
libsql copied to clipboard