prepack
prepack copied to clipboard
LLVM Backend
This lets us Prepack to native machine code or WebAssembly - without a JS runtime.
Prepack knows a lot about a program that it can evaluate. It is also highly specialized at getting rid of intermediate objects.
Most of the complexity of the serializer has to do with residual objects and closures that might leak to other JS.
Most of the complexity of a JS runtime comes from supporting the object model.
If we forbid leaking objects, and that Prepack has full knowledge of the program, then we know a lot about the types. This won't work with existing programs but new programs written for these constraints could benefit from this.
I wrote a new backend in parallel to the normal serializer. There is not a lot in common with the problem space so I decided to add a new serializer rather than build on the existing one.
Type System
In this first PR, only booleans and numbers are supported at the interop layer but I expect to support closures, symbols, and array buffers. Longer term we can support strings and Typed Objects.
The type system is currently strongly typed so it will reject a program where abstract values yield more than one type.
Functions are modeled by the normal function with return __abstract(':void', 'linkMethodName')
. The argument types are inferred by the arguments.
I model booleans as i1
, integrals as i32
and other numbers as f64
.
Limitations
The limitations are mainly in the same set of problems we're currently investigating. Loops and recursive functions are not allowed.
The generated code must inline everything to completely get rid of all objects. This can yield bloated and suboptimal code.
In the future I hope that we can use arena allocation of custom object structures to temporarily store values created in recursive functions and loops.
Is This Useful?
I could see this as helpful for simpler functions such as animation functions that need to run on a different thread, audio processing functions, simple but highly parallizable functions like shaders etc.
It could potentially be useful for some React components that needs to execute at extreme performance.
Installation
This PR adds an optional dependency on the llvm-node
project which contains node bindings to LLVM.
Downstream users of prepack
doesn't automatically install these dependencies. Instead they have to be manually installed in the parent project. For this reason, the prepack CLI lazily requires these modules and print an error message if they're not installed.
It requires both cmake and LLVM to be installed. llvm-node
depends on the nan
project which should install automatically but I had to manually install nan
first for some reason.
MacOS installation instructions:
brew install cmake
brew install llvm
yarn add nan
yarn add llvm-node
Additionally running the yarn test-llvm
command requires the lli
tool (LLVM interpreter) available on the PATH.
Building a Native Program
Compile to LLVM bitcode:
prepack filename.js --emitLLVM --out filename.bc
Compile to native assembly:
llc hello.bc -o hello.s
Link the program to a native executable:
gcc filename.s -o filename
Run it:
./filename
Debug by printing the LLVM IR assembly language code:
prepack filename.js --emitLLVMAssembly
Future Work
- [x] Model strings as stack allocations and allow them to be passed as pointers.
- [ ] Expose ArrayBuffer as stack allocations and allow them to be passed as pointers.
- [ ] Allow optimized residual functions to be passed as callbacks. Must not mutate global module state.
- [ ] Bridge TypedObjects to some kind of memory managed mechanism for passing rich objects to C++.
- [ ] Precompile regular expressions or link to an external library.
- [ ] Map Math methods to LLVM operations.
- [ ] Track float32 types returned by Math.fround. Allow functions to return float32.
- [ ] Implement built-ins methods on strings etc. in JavaScript
- [ ] Implement BigInt spec as Int64.
Basically the idea is that the main function and any residual function are “actors” that process some data. They’ll use arena allocation and are expected to be short lived.
A runtime outside of these can control the memory management of long lived objects. That allows for parallelism and more efficient memory management outside of these “worklets”.
Fantastic work!
You bypass the existing serializer, but then still find the existing SerializationContext
useful, but then also add some hacks to work with the existing BabelNodeExpression
s. I wonder if that the first thing we should clean up here --- make the SerializationContext
generic and not BabelNodeExpressions
specific, and generally clean that thing up --- it really just grew out of immediate needs.
Also, I wonder if you'd soon need something like the ResidualHeapVisitor
when you want to support objects. The visitor computes some useful information.
Yea, this exercise really shows where our abstractions leak and where they don't. The generators are fairly flexible but still a bit leaky. There are some things in there that isn't really necessary from interpreters point of view, but it's a hard dependency from the interpreter.
E.g. creating intermediate variables happens in the interpreter right now. The generator also inserts temporary variables itself. The rest of the system is essentially SSA so it gets a little awkward to manage both. In this PR I just undo this by storing my own variable map and undoing the temporary assignment. We should try to move that concept out to be completely isolated in the serialization pass instead of interleaved.
Regarding the BabelNodeExpression
hack, while the SerializationContext
has an unfortunate dependency on it, the deeper issue is actually with AbstractValue
whose build node depends on us materializing nodes before we actually know what operation we're serializing. That's the one we need to think a bit about.
I originally expected LLVM to help me with much of what the visitor does, but it is lacking in some areas so yea I might need a pre-processing pass like the ResidualHeapVisitor.
And as a plus, theResidualHeapVisitor
is completely ignorant of BabelNodeExpression
s.
Ideally, I'd like to move out all build nodes to a Babel-specific place, and instead place specific named instructions in the generator. That would allow printing the generator tree in some nice assembly format, and then different backends can be plugged in more easily.
Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.
Before we can review or merge your code, we need you to email [email protected] with your details so we can update your status.
@NTillmann Are the two serialisers going to merge sometime in future? Coming from a contributor's perspective, what changes can we expect in the current serialiser?
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!