prepack icon indicating copy to clipboard operation
prepack copied to clipboard

LLVM Backend

Open sebmarkbage opened this issue 5 years ago • 7 comments

This lets us Prepack to native machine code or WebAssembly - without a JS runtime.

Prepack knows a lot about a program that it can evaluate. It is also highly specialized at getting rid of intermediate objects.

Most of the complexity of the serializer has to do with residual objects and closures that might leak to other JS.

Most of the complexity of a JS runtime comes from supporting the object model.

If we forbid leaking objects, and that Prepack has full knowledge of the program, then we know a lot about the types. This won't work with existing programs but new programs written for these constraints could benefit from this.

I wrote a new backend in parallel to the normal serializer. There is not a lot in common with the problem space so I decided to add a new serializer rather than build on the existing one.

Type System

In this first PR, only booleans and numbers are supported at the interop layer but I expect to support closures, symbols, and array buffers. Longer term we can support strings and Typed Objects.

The type system is currently strongly typed so it will reject a program where abstract values yield more than one type.

Functions are modeled by the normal function with return __abstract(':void', 'linkMethodName'). The argument types are inferred by the arguments.

I model booleans as i1, integrals as i32 and other numbers as f64.

Limitations

The limitations are mainly in the same set of problems we're currently investigating. Loops and recursive functions are not allowed.

The generated code must inline everything to completely get rid of all objects. This can yield bloated and suboptimal code.

In the future I hope that we can use arena allocation of custom object structures to temporarily store values created in recursive functions and loops.

Is This Useful?

I could see this as helpful for simpler functions such as animation functions that need to run on a different thread, audio processing functions, simple but highly parallizable functions like shaders etc.

It could potentially be useful for some React components that needs to execute at extreme performance.

Installation

This PR adds an optional dependency on the llvm-node project which contains node bindings to LLVM.

Downstream users of prepack doesn't automatically install these dependencies. Instead they have to be manually installed in the parent project. For this reason, the prepack CLI lazily requires these modules and print an error message if they're not installed.

It requires both cmake and LLVM to be installed. llvm-node depends on the nan project which should install automatically but I had to manually install nan first for some reason.

MacOS installation instructions:

brew install cmake
brew install llvm
yarn add nan
yarn add llvm-node

Additionally running the yarn test-llvm command requires the lli tool (LLVM interpreter) available on the PATH.

Building a Native Program

Compile to LLVM bitcode:

prepack filename.js --emitLLVM --out filename.bc

Compile to native assembly:

llc hello.bc -o hello.s

Link the program to a native executable:

gcc filename.s -o filename

Run it:

./filename

Debug by printing the LLVM IR assembly language code:

prepack filename.js --emitLLVMAssembly

Future Work

  • [x] Model strings as stack allocations and allow them to be passed as pointers.
  • [ ] Expose ArrayBuffer as stack allocations and allow them to be passed as pointers.
  • [ ] Allow optimized residual functions to be passed as callbacks. Must not mutate global module state.
  • [ ] Bridge TypedObjects to some kind of memory managed mechanism for passing rich objects to C++.
  • [ ] Precompile regular expressions or link to an external library.
  • [ ] Map Math methods to LLVM operations.
  • [ ] Track float32 types returned by Math.fround. Allow functions to return float32.
  • [ ] Implement built-ins methods on strings etc. in JavaScript
  • [ ] Implement BigInt spec as Int64.

sebmarkbage avatar Jul 16 '18 07:07 sebmarkbage

Basically the idea is that the main function and any residual function are “actors” that process some data. They’ll use arena allocation and are expected to be short lived.

A runtime outside of these can control the memory management of long lived objects. That allows for parallelism and more efficient memory management outside of these “worklets”.

sebmarkbage avatar Jul 16 '18 08:07 sebmarkbage

Fantastic work!

You bypass the existing serializer, but then still find the existing SerializationContext useful, but then also add some hacks to work with the existing BabelNodeExpressions. I wonder if that the first thing we should clean up here --- make the SerializationContext generic and not BabelNodeExpressions specific, and generally clean that thing up --- it really just grew out of immediate needs.

Also, I wonder if you'd soon need something like the ResidualHeapVisitor when you want to support objects. The visitor computes some useful information.

NTillmann avatar Jul 16 '18 10:07 NTillmann

Yea, this exercise really shows where our abstractions leak and where they don't. The generators are fairly flexible but still a bit leaky. There are some things in there that isn't really necessary from interpreters point of view, but it's a hard dependency from the interpreter.

E.g. creating intermediate variables happens in the interpreter right now. The generator also inserts temporary variables itself. The rest of the system is essentially SSA so it gets a little awkward to manage both. In this PR I just undo this by storing my own variable map and undoing the temporary assignment. We should try to move that concept out to be completely isolated in the serialization pass instead of interleaved.

Regarding the BabelNodeExpression hack, while the SerializationContext has an unfortunate dependency on it, the deeper issue is actually with AbstractValue whose build node depends on us materializing nodes before we actually know what operation we're serializing. That's the one we need to think a bit about.

I originally expected LLVM to help me with much of what the visitor does, but it is lacking in some areas so yea I might need a pre-processing pass like the ResidualHeapVisitor.

sebmarkbage avatar Jul 16 '18 16:07 sebmarkbage

And as a plus, theResidualHeapVisitor is completely ignorant of BabelNodeExpressions.

Ideally, I'd like to move out all build nodes to a Babel-specific place, and instead place specific named instructions in the generator. That would allow printing the generator tree in some nice assembly format, and then different backends can be plugged in more easily.

NTillmann avatar Jul 16 '18 16:07 NTillmann

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.

Before we can review or merge your code, we need you to email [email protected] with your details so we can update your status.

facebook-github-bot avatar Jul 25 '18 21:07 facebook-github-bot

@NTillmann Are the two serialisers going to merge sometime in future? Coming from a contributor's perspective, what changes can we expect in the current serialiser?

ManasJayanth avatar Aug 11 '18 05:08 ManasJayanth

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

facebook-github-bot avatar Aug 11 '18 05:08 facebook-github-bot