libsql
libsql copied to clipboard
Allow registering and executing WebAssembly functions
This draft implements a mechanism for registering and running Wasm functions. The current runtime of choice is wasmtime and its libwasmtime.so library with C bindings (but a switch to Rust should be considered, because that's the native language of wasmtime and the only interface which offers all of its features).
It operates on a very crude ABI (ref:#16), where ints and doubles are passed to/from WebAssembly as is, and for strings/blobs/null it passes a pointer to a structure:
- string:
[1 byte for type specification][data]
- blob:
[1 byte for type specification][4 bytes of size][data]
- null:
[1 byte for type specification]
The way it's implemented now is twofold:
- There's an internal
run_wasm
function, capable of running WebAssembly and translating the parameter types from and to the Wasm module - A dynamic lookup table, currently a regular SQL table:
CREATE TABLE libsql_wasm_func_table(name text PRIMARY KEY, body text)
, which actually also needs to be created and filled manually at the time of this writing.
After creating and filling the new meta-table, when a function call is used in a statement, e.g. SELECT id, fib(id) FROM t
, and function fib
is neither built-in nor user-defined, it will be looked up in the table. If found, its body will be assumed to hold valid WebAssembly code, compiled and run.
In order to enable WebAssembly integration, run configure with ./configure --enable-wasm-runtime
parameter.
A few example WebAssembly-based user-defined functions coded in Rust can be found here: https://github.com/psarna/libsql_bindgen
Here's an inline demo for testing purposes, with a WebAssembly fibonacci sequence already compiled from Rust and copied in-place:
CREATE TABLE IF NOT EXISTS libsql_wasm_func_table(name text PRIMARY KEY, body text);
INSERT INTO libsql_wasm_func_table (name, body) VALUES ('fib', '
(module
(type (;0;) (func (param i64) (result i64)))
(func $fib (type 0) (param i64) (result i64)
(local i64)
i64.const 0
local.set 1
block ;; label = @1
local.get 0
i64.const 2
i64.lt_u
br_if 0 (;@1;)
i64.const 0
local.set 1
loop ;; label = @2
local.get 0
i64.const -1
i64.add
call $fib
local.get 1
i64.add
local.set 1
local.get 0
i64.const -2
i64.add
local.tee 0
i64.const 1
i64.gt_u
br_if 0 (;@2;)
end
end
local.get 0
local.get 1
i64.add)
(memory (;0;) 16)
(global $__stack_pointer (mut i32) (i32.const 1048576))
(global (;1;) i32 (i32.const 1048576))
(global (;2;) i32 (i32.const 1048576))
(export "memory" (memory 0))
(export "fib" (func $fib)))
');
CREATE TABLE IF NOT EXISTS example(id int PRIMARY KEY);
INSERT OR REPLACE INTO example(id) VALUES (7);
INSERT OR REPLACE INTO example(id) VALUES (8);
INSERT OR REPLACE INTO example(id) VALUES (9);
SELECT id, fib(id) FROM example;
This series also comes with syntactic sugar for registering and deregistering Wasm functions dynamically via SQL: CREATE FUNCTION
and DROP FUNCTION
: Fixes #18
Fixes #17
This is only a draft for multitude of reasons, the most important ones being:
- [x] lack of automated tests
- [x] currently, invoking Wasm-based user-defined functions causes an explicit memory leak during lookup - these functions need to be tracked and cached (also to avoid Wasm recompilation) - namely, once registered dynamically, the function should simply end up on the list of all the other user-defined functions
Great work! Perhaps you could consider CNCF's WasmEdge, which has a well maintained C SDK with LLVM-based AOT support for embedding. :)
https://github.com/wasmedge/wasmedge
https://wasmedge.org/book/en/sdk/c.html
Disclaimer: I am a maintainer at WasmEdge. We helped Nebula Graph and TiDB to support similar Wasm UDFs in their SQL DBs.
@juntao I actually looked it up earlier today, we're definitely interested in giving it a go! And, eventually, make the implementation runtime-agnostic by relying on Wasm C API (https://github.com/WebAssembly/wasm-c-api) that @losfair mentioned in another issue.
I remember from my morning research that the C dynamic library from WasmEdge release page was ~50MB, which is quite heavy compared to libwasmtime's 17 - are you aware of any thinner versions of it?
Yes. I believe WasmEdge supports the standard C API -- I will confirm.
The WasmEdge dynamic library really should not be that big. The distribution binary of WasmEdge is only 8MB. I think the large version contains LLVM so that it can do AOT compilation w/o external dependency. Let me double check and revert. Thank you!
I remember from my morning research that the C dynamic library from WasmEdge release page was ~50MB, which is quite heavy compared to libwasmtime's 17 - are you aware of any thinner versions of it?
Hi,
The official release contains the ahead-of-time compilation (with LLVM inside). So it may take more space. However, if you are looking for a tiny version, we have wasmedge/slim-runtime
1, which is the runtime only without the compiler inside.
$ file libwasmedge.so.0.0.0
libwasmedge.so.0.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=24d0e767d4f29f65dedc1c334e72e2320b6d391c, stripped
$ size libwasmedge.so.0.0.0
text data bss dec hex filename
1806540 45848 904 1853292 1c476c libwasmedge.so.0.0.0
~= 1.8M
Thank you @hydai
@psarna I think the 1.8MB WasmEdge runtime library is sufficient for your use case. Developers can compile their functions to regular Wasm in any tool they choose. They can further use the wasmedgec
tool to do AOT compiling before submitting the Wasm file to the database engine. The 1.8MB runtime library can handle both cases.
Ref: https://wasmedge.org/book/en/quick_start/run_in_aot_mode.html
Also, we do not yet support the proposed "standard" C API. But it could be supported if there is user demand. :)
Splendid, thanks guys! 1.8MiB sounds way more aligned with edge use cases indeed, will give it a try
v2:
- .wat source code is now precompiled to a wasm module during initialization, once
- each valid wasm function is now dynamically registered as a user-defined function, so it's not recompiled on consecutive exeuctions
Still to do: automated tests
v3:
- multiple fixes
- added CREATE FUNCTION statement
- added DROP FUNCTION statement
TODO:
- [x] automated tests
- [x] docs
Great works. We are working on using wasm to supply UDF functions in openGauss database too. In our work, we supply an init function to load the wasm mode from local .wasm or .wat file, and parse the file to get the exported function informations. Then users can get the exported function signature intuitively through the tables we supplied.
We also made a demo to run wasm code in openGuass database, supplied with a docker image to experience. You can find the project and docker image info below. https://github.com/Nelson-He/openGauss-wasm
Hope we will keep in touch and exchange the thoughts further more.
@penberg please don't merge just yet as it needs more quality Rust tests, covering all types and so on, but I'm marking this PR as ready for review, because the main code is.
@penberg I added a few more tests, I'm yet to produce docs (mentioning that the thing is experimental), but code review is welcome!
self note: there's a spontaneous double free caused by dropping a wasm function, will investigate tomorrow
(false alarm, the bug was in the tests)
I added an initial doc explaining Wasm functions and configuration
todo based on manual tests: CREATE FUNCTION should accept not only strings, just expressions - that would allow using readfile
for uploading large .wat files
Another todo: we should seriously consider switching to storing raw .wasm binary blobs exclusively, or at least add such support next to .wat files. Wat files are human readable and more verbose, but also more bloated, and require compilation. .wasm on the other hand would allow us to use considerably lighter runtimes (e.g. WasmEdge in slim mode) and would not require a compilation step
v4:
- blobs are supported
@penberg @glommer I would appreciate external input on a decision:
Right now, this PR abuses the modularity of libSQL a little, because functions are registered at parsing stage, not later, when the virtual machine executes the statement. Ideally, the only effect of the parsing stage should be vdbe opcodes, to be executed by vdbe later. Implementing it the proper way would take more time and be more intrusive (a new opcode, incompatible with sqlite, would be introduced), but, well, it will be more proper. Do you think I should go ahead and add it to this PR, or move it to a follow-up? Backward compatibility will be preserved for users anyway, but if we prefer this PR to be self-sufficient and solid, it's better if I spend more time on it and add a new opcode on top.
edit: I think I don't need external input after all, without doing it properly things like EXPLAIN
statement get broken, so it's not really acceptable. I'll provide v5 with proper new opcodes as soon as possible
v5:
- opcodes for creating and dropping functions are added
- EXPLAIN now works, verified with a Rust test
@penberg gentle review ping
While this PR is in review, I'll take a shot at trying to move the Wasm parts of the implementation straight to Rust, thus dropping the libwasmtime dependency. It would be much better developer experience I believe, with no manual steps for downloading dependencies
I also plan to make the Rust layer as runtime-agnostic as possible, so hopefully we could easily provide WasmEdge and Wasmer integration later as well
@psarna Sorry for asking this so late in the development cycle, but could we refactor this so that the core code only has hooks for a Wasm runtime and move the wasmtime stuff in ext/wasm
, for example? The reasoning is that the application that embeds libSQL could already have a Wasm runtime (for example, the ChiselStrike runtime already has V8), and it would be great to be able to hook into that. What do you think of this type of design approach?
@penberg actually, that's kind of what I'm doing right now - moving the actual implementation of all the bindings to Rust - and then everyone can add their own bindings. ext/wasm
is already taken for the other way round, compiling libsql to WebAssembly :innocent: but I'll pick another directory name
@penberg I did the following v4 change: instead of using wasmtime interface directly, as of now there exists a vendor-agnostic ext/udf/wasm_bindings.h
header, and a ext/udf/wasmtime_bindings.c
implementation. The vendor-agnostic header is based on wasmtime's C interface on a 1:1 basis for now, as it's the only working backend for now anyway, and it can be subject to change later. With that, integration with V8 can be done as follows:
- Produce a
ext/udf/v8_bindings.c
file implementing the interface defined inext/udf/wasm_bindings.h
- Add it to the build process by editing thousands of Makefile.in, configure.ac and other autoconf files
Here's the commit that introduces the change: https://github.com/libsql/libsql/pull/45/commits/f7890509006497d610a771c6f2ab9d437741dc89
Opinions?
Oh and my idea for later is to build a Rust-based implementation of the ext/udf/wasm_bindings.h
header, and then we can drop the libwasmtime dependency, because we'll be building a version of it ourselves. That would also be a great opportunity to validate if the interface from ext/udf/wasm_bindings.h
is convenient enough
@penberg scratch that, it's not the best approach. Tomorrow I'll make wasm_bindings.h considerably smaller, consisting only of functions to instantiate a new wasm function and run it. Then, one of the implementations of this interface will be the code we have now, and v8 can have its own specialized impl.
@penberg v5: there, that's more like it, the implementation now simply needs 2 functions: https://github.com/libsql/libsql/pull/45/commits/aed9bd8f36ce335507785529df68917c7dd5015b , that should be easy enough to port to v8 and native Rust implementation of the Wasmtime backend.