WebAssembly based parser
Web assembly is already available ( behind flag ) in node 6, so might worth trying it as a next level of "Userspace JIT" approach
Some overview: https://ia601503.us.archive.org/32/items/vmss16/titzer.pdf
Working wasm examples ( run with node --expose-wasm):
function fib(stdlib, foreign, heap) {
"use asm";
var i32 = new stdlib.Int32Array(heap);
var f64 = new stdlib.Float64Array(heap);
var imul = stdlib.Math.imul;
function fib(n) {
n = n|0;
if (n >>> 0 < 3) {
return 1|0;
}
return (fib((n-1)|0) + fib((n-2)|0))|0;
}
return {
fib:fib
};
}
var m = _WASMEXP_.instantiateModuleFromAsm(fib.toString());
var asmFib = m.fib;
var n = 38;
console.time("ASM:fib(" + n + ")");
var f = asmFib(n, global);
console.log(f);
console.timeEnd("ASM:fib(" + n + ")");
binary wasm:
https://gist.github.com/sidorares/90607f73b499f2ccb7dd908a080ebe5d
Problems:
- no objects or strings in output. The only possible way I see now: generate json representation on the heap and use
JSON.parse. Might still be fast! ( JSON.parse is very fast and often surpass manual object creation in speed ) - quite a lot of work for a unknown performance gain
very good read on generating wasm: https://github.com/zbjornson/human-asmjs
update for node 6.x (tested with 6.9.1): _WASMEXP_ now became Wasm (not sure if that can be overriden)
https://github.com/reklatsmasters/webassembly-examples
Thanks for the link to my examples! I think it's possible to use internal yacc-based sql parser sql/sql_yacc.yy. However, it's a quite difficult.
@reklatsmasters thanks for your work! At the moment I think it's better to generate raw wasm on the fly rather than trying to compile c code
@sidorares Hm, interesting. You want to use current js-based parser compiled to wasm? What does "generate raw wasm on the fly" mean?
no, a bit different
Mysql protocol looks like this:
client: "SELECT foo,bar from FOOBAR"
server: "OK!"
server: "This is what I have"
server: "foo:string"
server: "bar:int"
server: "now data:"
server: "foo value, bar value"
server: "foo value, bar value"
...
...
server: "foo value, bar value"
server: "done"
What I'm currently doing is when schema is known ( just before "now data" part ) JS function is generated that is optimised for deserealising data of only that particular shape from rod packets ( e.i read string, then read int ).
I want to try to implement that part ( "generate a deserialiser" function at runtime ) in wasm
Any help would be really appreciated!
I want to try to implement that part ( "generate a deserialiser" function at runtime ) in wasm
In this case you should call js functions from wasm for parse incoming message (call packet.parseEtc()). Sometimes it's a quite slow. I think, parser of an incoming message, generator of a deserialiser and deserialiser should be implemented in wasm. It may look like this:
// example in c
enum FIeldType {
INT,
FLOAT,
// ...
}
struct Shema {
// ...
}
void parse_message(const char* buffer, int size, const Shema* shema) { /* ... */ }
const Shema* define_shema() { /* ... */ }
void append_field(const Shema* shema, FIeldType field) { /* ... */ }
call packet.parseEtc()
All this parseXXX code also could be inlined to resulting wasm
Input: schema description, followed by stream of binary data matching schema
Output: wasm function that calls external JS function for each received row with all data deserealised, with minimum amount of extra work for JS to do. Since wasm does not have opjects or strings the closest we can have is ArrayBuffer containing JSON with result.
inlined to resulting wasm
If it meens 'implemented in wasm', i vote "yes".
that calls external JS function
It's necessary to minimise external calls. In the best case, remove them all.
wasm does not have objects or strings
We can interpret a part of wasm memory as a string:
- give a pointer to the begining data
- give a size of a string
Buffer.from(memory.slice(ptr, size)).toString()
Also, we can read null-terminated strings.
Would be good to set initial benchmark as a first step - manually generate parser for some predefined schema and compare speed with js parser. Can you help with this @reklatsmasters ?
hey folks, waking up this thread! @sidorares nowadays node has a way better support to WASM, so I think there's a lot of benefit we can take from there.
personally I don't have a clue about C, but if Rust could be considered, I'd love to contribute to this.
looks like AssemblyScript may be a good option for mysql2 use-case also. it is a typescript-like language (so has a friendlier experience for JS developers), but still has the benefits and performance of web assembly.