rusp Implementing JIT

trafficstars

I am interested in implementing JIT for Rusp, sooner rather than later, as we can then focus towards a language that uses LLVM better rather than trying to shoe-horn JIT in later (a seemingly common problem). I'm trying to figure out how to best do this at the moment, so I am going to outline some related ideas here for discussion.

Basic Facts

LLVM, being a more general system, essentially has "function" as it's basic unit, this isn't actually much of a problem, especially for a dynamically typed language, as all top-level expression will just be wrapped in an anonymous function, which is then executed. Since being dynamically typed means that all types are actually just a tagged union, we don't have to try to figure out the type of the expression before wrapping it.

It is possible register global values, including functions, with the LLVM ExecutionEngine. This will make Rust integration much easier.

Translation

All Rusp expression should be translated into LLVM IR as they are parsed, as in, each entire s-expr should be converted to LLVM IR. The two core primitives that are relevant here are def and fn as they are used from user code, but need to modify the LLVM module. As such, it may be worthwhile special casing just those two primitives in order to make that much easier.

Otherwise translation would go as generally expected, consisting mostly of function calls out to the runtime. There is a lot of scope for optimization here, but that can be added later.

Function Call ABI

The ABI should probably look a bit like this (in LLVM IR Assembly):

declare void @fun(%ruspbox* sret %retval, %ruspenv* %env, %ruspbox %args);

Side Note: in LLVM IR, you can define struct definitions and assign them names, this is what %ruspenv and %ruspbox are.

Where %ruspenv* is a pointer to the environment (can be null) and %ruspbox* is a pointer to a rusp value. We would pass a return value as an argument, and the arguments would just be a rusp list. The return value is just void, but it could be a value, if we want to implement some sort of basic unwinding/error handling.

Checking that the callee and caller match up should be done by the callee (i.e. the function being called), as function re-definitions may change whether a call is legal or not and ultimately it is the callee that has the final say as whether it has been called properly.

Type Boxes

Having dynamic types means that we need to attach type information to the values so we can inspect them at runtime. This is what we have now with the Value enum. However, using the enum directly is not particularly feasible, as we would then have to handle a significant amount of Rust implementation details, such as the layout of the enum, the header on strings and the header on vectors. This isn't something I particularly want to do.

I suggest then, that since we can rely on the layout of structs (especially with the #[packed] attribute), we implement a basic tagged union for our types:

#[packed]
struct RuspBox {
  tag: u8,
  priv pad: [u8,..3],
  val: u64
}

val is a u64 in order to be capable of holding the primitive types we will have: int (64-bit signed integer), float (double-precision floating-point number), bool (boolean value) and atom (self-representing symbol). That field will also hold a pointer to the other primary types: str (string), list (list value) and obj ("object" type). If we are willing to accept the limitation of 24 bits of space, then we could use the 3 padding bytes to store a length for the str and list types, simply masking the tag out.

Macros

Macros can be implemented the same way as everything else, with the same idea that is currently used being used to implement them in JIT, simply functions that take and return a list, the only difference is that the translator will need to check to see if a call expression is actually a macro and then call it in-place if it needs to.

Jun 04 '13 07:06 Aatch

This sounds good, although how will this interact with garbage collection? Can we leverage the Rust GC to collect the JIT'd Rusp values? (I have literally no idea how this works.)

We could also get 30 bits of length for str & list, via: #[packed] struct RuspBox { tag: u32, val: u64 } with an encoding similar to the following:

00xxx...xxx: str with length xxx...xxx.
01xxx...xxx: listwith length xxx...xxx.
1yyyy...yyy: value with tag yyyy...yyy.

Or even have a "short list/str" tag and a "long list/str" tags, the former stores the length in RuspBox, and the latter with the pointer. Of course, both of these are possibly too complicated to be worth it.

An entirely different approach would just have a single u64 and use pointer tagging, but this is probably quite platform specific. (This would also restrict the ints to 63 bits, although this probably isn't too horrible.)

Jun 04 '13 09:06 huonw

(Hm, actually, the pointer tagging works easily on 32-bit platforms (the upper 32-bits are entirely unused), leaving only x86-64, so it's probably not that bad. I don't know.)

Jun 04 '13 09:06 huonw

This looks great @Aatch, thanks for putting so much time into laying this out clearly. From the little I know of this area, it looks quite nicely thought out. This is all very new to me, so it's great to have your assistance. :)

Maybe @z0w0 might be able to give some feedback? He has expressed some interest in Rusp on IRC, and I know he's done at least a little work with JIT in the past.

Jun 04 '13 13:06 brendanzab

By the way, you mention that trying to "shoe-horn JIT in later" is "a seemingly common problem". Could you elaborate on that? What issues have other languages had in the past that we can learn from?

Jun 04 '13 13:06 brendanzab

@bjz the main issues I've seen regarding LLVM's JIT is when people try to take an existing language and feed it in LLVM. In some cases this works fine, but often the language is designed and implemented in a way that makes using LLVM difficult.

For us, this just means making sure that the interpreter and execution engine are tightly integrated. This will make it easier to implement things like macros efficiently. It also affects our representation of user-defined types, as there is a bit of a tradeoff between performance and usability that we will need to decide on.

@huonw we will probably have to implement our own GC for this, ideally utilising LLVM's existing garbage collection support. How this will affect the representation of values though, I don't know.

Jun 04 '13 21:06 Aatch

Also, how will this go with respect to embeddability? Presumably we'd also want a non-LLVM interpreter?

Jun 05 '13 05:06 huonw

rusp rusp copied to clipboard

Implementing JIT

Basic Facts

Translation

Function Call ABI

Type Boxes

Macros

rusp
rusp copied to clipboard