http-parser icon indicating copy to clipboard operation
http-parser copied to clipboard

[Idea] Rewriting in LLVM DSL

Open indutny opened this issue 7 years ago • 21 comments

I know it sounds crazy, but bear with me :wink:

What would you think about rewriting whole project using some JavaScript tool to compile a DSL to a LLVM IR?

I've several reasons for this:

  • Getting rid of huge switch statement that isn't optimized well, and putting separate states into separate procedures with optimized jumps between the states
  • Having architecture-specific vector comparisons that are easy to use

It doesn't sound to complicated, and may finally make our codebase shine.

cc @bnoordhuis @mscdex

indutny avatar Feb 11 '18 03:02 indutny

Here is an example of what it might look like: https://github.com/indutny/llparse/blob/master/test/fixtures/example.js

indutny avatar Feb 11 '18 20:02 indutny

If you're thinking of writing some lexer generator, I could definitely get behind that - I've had the same thought more than once - but why LLVM IR specifically and wouldn't we be reinventing ragel?

bnoordhuis avatar Feb 12 '18 13:02 bnoordhuis

@bnoordhuis there're probably better solutions out there, but I always wanted to write something that compiles to LLVM IR. Is it a valid excuse?

On a more serious note, I'd like every state implementation to live in a separate procedure that tail-calls other procedures in the most of the cases. This might turn out to be faster than having a single grand dispatch.

indutny avatar Feb 12 '18 13:02 indutny

Also on the feature list is having a low-level trie implementation that works seamlessly.

indutny avatar Feb 12 '18 13:02 indutny

It could probably work in a way similar to Ragel, however it'd have to rely on LLVM to inline the C code into compiled nodes of the state machine graph.

indutny avatar Feb 12 '18 13:02 indutny

You can't replace http-parser with something that spits out LLVM IR, that leaves too many users out in the cold. Something that compiles to C or has different back-ends could work though (but then you're half-way on the road to reimplementing ragel or re2c.)

a low-level trie implementation

What for?

bnoordhuis avatar Feb 12 '18 16:02 bnoordhuis

We obviously need a trie-like state machine to replace hand-written state code for the methods.

indutny avatar Feb 12 '18 21:02 indutny

Not sure I follow. For lexers, you compute the DFA or NFA and then emit state tables or goto-based code. I guess you could implement it as a trie but I don't know why you would.

bnoordhuis avatar Feb 12 '18 21:02 bnoordhuis

I didn't mean in-memory trie, as a data-structure. Rather a compiled code consisting of branches and states linked in a trie-like structure.

indutny avatar Feb 12 '18 21:02 indutny

Ah, okay. You get that for free with a DFA but I guess a trie is pretty much a DFA encoded as a tree.

bnoordhuis avatar Feb 12 '18 22:02 bnoordhuis

I think I'm on the edge of giving up. Just figured out that musttail may not work on arm.

indutny avatar Feb 13 '18 06:02 indutny

Nvm, it works! 👍

indutny avatar Feb 13 '18 06:02 indutny

Took few iterations, but I think I've reached some intermediate milestone here: https://github.com/indutny/llparse/blob/master/test/api-test.js

What do you think, @bnoordhuis ?

indutny avatar Feb 17 '18 08:02 indutny

Here is what it produces: https://gist.github.com/indutny/ac9eedc036a43098b2ad0bf7b63e9f65

indutny avatar Feb 17 '18 09:02 indutny

@bnoordhuis better example here: https://github.com/indutny/llparse/tree/master/examples/http

indutny avatar Feb 17 '18 23:02 indutny

API-wise it looks real nice and the generated code looks tight apart from a hiccup:

$ make -C examples/http
node index.js > http.ll
cc -g3 -Os -flto -fvisibility=hidden -Wall http.ll main.c -o http
warning: overriding the module target triple with x86_64-apple-macosx10.13.0 [-Woverride-module]
1 warning generated.
cannot guarantee tail call due to mismatched parameter counts
  %15 = musttail call fastcc i8* @http_parser__invoke_on_complete(%http_parser_state* %0, i8* %14, i8* %2)
cannot guarantee tail call due to mismatched parameter counts
  %40 = musttail call fastcc i8* @http_parser__invoke_on_complete(%http_parser_state* %0, i8* %39, i8* %2)
cannot guarantee tail call due to mismatched parameter counts
  %10 = musttail call fastcc i8* @http_parser__method(%http_parser_state* %0, i8* %1, i8* %2, i32 0)
LLVM ERROR: Broken module found, compilation aborted!
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [http] Error 1

$ cc -v
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Do you have plans for a C back-end?

bnoordhuis avatar Feb 19 '18 22:02 bnoordhuis

The hiccup is due to a bug in -flto, see: https://bugs.llvm.org/show_bug.cgi?id=36441 . I've just pushed a fix to a Makefile in that example.

Here is some real work on porting http parser: https://github.com/indutny/llhttp.


I don't really have plans for a C back-end yet, but will likely have to explore this eventually.

indutny avatar Feb 19 '18 22:02 indutny

Note that despite not being present in the example, I've just introduced an API for creating callbacks using LLVM IR (instead of a reference to C function): https://github.com/indutny/llparse/blob/81578c8f41a926514a6625b2a9c9941218728408/test/fixtures/index.js#L85-L121

After I'll add an API to extend the state structure from the user code, these compiled code chunks will be able to update the state without ever calling the C-land. C-land calls are sort of expensive right now, as they can't be inlined (due to various machine-specific flags that has to be matched).

Additionally, I plan to introduce "mark"s that would work in the same way as they do in http_parser.c

indutny avatar Feb 19 '18 22:02 indutny

@bnoordhuis and so we got spans (instead of "mark"s): https://github.com/indutny/llhttp/blob/38fb3aed8c7290aec389a109e2e8cd1c004872c4/lib/llhttp/url.js#L12 . They can be interleaved if we'd ever want to, and they should be efficient.

indutny avatar Feb 22 '18 06:02 indutny

"Getting rid of huge switch statement that isn't optimized well"

Can the switches be permuted to get speedup on average inputs? Perhaps a utility that takes as input a URI corpus file and outputs optimal permutations for the switches?

chadbrewbaker avatar Mar 27 '18 19:03 chadbrewbaker

@chadbrewbaker there's little point in this, switches like the one in http_parser are optimized into jump table.

indutny avatar Mar 27 '18 20:03 indutny