pikelet
pikelet copied to clipboard
Compiler backend
It would be nice to actually compile Pikelet programs!
Initial sketch of passes
We'll probably want some intermediate representations like MLton to get the higher level Pikelet code to be in better shape for passing to a back-end.
- [x] Raw - check/infer/elaborate -> Core
- fill in holes
- pass instances and implicit args explicitly
- add explicit coercions
- [ ] Core - A-Normalize -> A-Normal Form (#91)
- make evaluation order and control flow explicit
- [ ] A-Normal Form - Closure Conversion -> Closure Converted
- separate anonymous functions into a pair of:
- a function pointer
- the data containing the environment
- separate anonymous functions into a pair of:
- [ ] Closure Converted - ??? -> SSA
- [ ] SSA - ??? -> ...
- [ ] ... - Codegen -> LLVM/Cretonne/Bytecode
Unresolved questions
We would like to do more work on the type system to make efficient code-gen easier. For example tracking coeffects such as usage mutliplicities (#7), and data flow annotations (allowing us to partially evaluate code at compile time). Regions would also be handy at some stage.
Another issue we'll have to face at some time is garbage collection. Ultimately it would be nice to not have to force users to use a specific allocation strategy, but figuring out how this interacts with multiplicities might be tricky. Until we have a better idea it might be be easier just to use something like the Boehm-Demers-Weiser GC in the interim.
Possible codegen targets
Some possible targets for codegen are as follows:
- LLVM (the inkwell crate looks handy!)
- Cretonne
- Web Assembly (could be covered by Cretonne or LLVM)
- Some sort of Typed Assembly Language (TAL)?
- Custom JIT (see runtimes-WG for some ideas)
A JIT might be handy as it would allow us to embed Pikelet in existing Rust codebases. This could help us gain adoption without needing to convince folks to switch to an entirely new language in the mid-term.
Resources:
- Implementing and Optimizing a Simple, Dependently-Typed Language
- Practical Implementation of a Dependently Typed Functional Programming Language
- Compiling A Functional Language
- Optimizing Closures in O(0) time
- MLton: CompilerOverview
- Miniphases: Compilation using Modular and Efficient Tree Transformations
- A Systematic Study of Functional Language Implementations
- FunTAL: Reasonably Mixing a Functional Language with Assembly
- Linking Types for Multi-Language Software: Have Your Cake and Eat It Too
- From System F to TAL
- The Essence of Closure Conversion
- Using Destination-Passing Style to Compile a Functional Language into Efficient Low-Level Code
- Destination-Passing Style for Efficient Memory Management
- jdreaver's big comment of resources on, "Is LLVM a good backend for Functional languages?"
- SinScheme's LLVM IR emmission (closure-convert.rkt)
- SinScheme's compiler docs
- TIL: A Type-Directed Optimizing Compiler for ML
- Thorin: CPS + Sea of nodes Graph IR
Example compilers in Rust
- https://github.com/sunfishcode/simplejit-demo
- https://github.com/GeorgeKT/menhir-lang
- https://github.com/jDomantas/plank
- https://github.com/agatan/minicom
- https://github.com/antoyo/tiger-rs
- https://github.com/BookOwl/kaleidoscope-rs
- https://gitlab.com/yorickpeterse/inko
- https://github.com/boomshroom/NovaLang
- https://github.com/dinfuehr/dora
Sixten is another functional language with dependent types and unboxed data types. Might be worth looking there for inspiration too!
Edwin Brady's thesis, Practical Implementation of a Dependently Typed Functional Programming Language, may also prove a useful resource here.
I'd definitely want to have some nice documentation about the compiler like MLton's at some stage! This would make contributing easier for newcomers.
Nice advice from @evincarofautumn:
A good way to build a compiler in my experience is to start with a simple program, like “hello world”, and make sure that end-to-end compilation works and produces the right output for just that literal program, then incrementally add stuff to the program—change the string from “hello world” to something else, add computations, take advantage of each language feature in turn and get it through the whole pipeline
So yeah, very much “one thing at a time”; at each step you have a full working (simple) compiler
Minor correction: Sixten has unboxed types but not dependent types (at least the Readme doesn't say so).
It actually does have dependent types! @ollef just isn't promoting it that way (yet?). If you have a look at the core syntax, you'll see that it describes a dependently typed lambda calculus.
-- | Expressions with meta-variables of type @m@ and variables of type @v@.
data Expr m v
= Var v
| Meta m (Vector (Plicitness, Expr m v))
| Global QName
| Con QConstr
| Lit Literal
| Pi !NameHint !Plicitness (Type m v) (Scope1 (Expr m) v)
| Lam !NameHint !Plicitness (Type m v) (Scope1 (Expr m) v)
| App (Expr m v) !Plicitness (Expr m v)
| Let (LetRec (Expr m) v) (Scope LetVar (Expr m) v)
| Case (Expr m v) (Branches Plicitness (Expr m) v) (Type m v)
| ExternCode (Extern (Expr m v)) (Type m v)
| SourceLoc !SourceLoc (Expr m v)
deriving (Foldable, Functor, Traversable)
-- | Synonym for documentation purposes
type Type = Expr