cs6120
cs6120 copied to clipboard
Project Proposal: Concurrency Extension for Bril
What will you do? Extend the Bril IR with concurrency primitives—specifically, add: 1. spawn: Launches a new thread running a given Bril function. 2. join: Waits for a thread to complete.
Also implement compiler passes that analyze or optimize concurrent code, focusing on thread-escape analysis (to identify data that remains thread-local). If time permits, add a parallel loop transformation pass that detects simple data-parallel loops and rewrites them into a parfor-style concurrency construct.
How will you do it? 1. Extend Bril Specification • Modify Bril’s textual/JSON grammar to include spawn and join (and possibly parfor). • Update the Bril parser (bril-txt) and TypeScript definitions (bril.ts) to recognize concurrency instructions. 2. Implement Concurrency in the Interpreter • In brili.ts, create a runtime mechanism to spawn threads (e.g. Node.js worker threads or some simpler concurrency simulation). • Maintain a thread table so join can wait on thread completion. 3. Add a Compiler Pass (Thread-Escape Analysis) • Build on the “escape analysis” concept (Blanchet et al., PLDI 1999) to detect whether allocated data is accessed by multiple threads. • If an object never escapes the creating thread, treat it as thread-local—no need for concurrency overhead or atomic operations. 4. (Stretch Goal) Parallel Loop Transformations • If time permits, implement a pass that scans for loops which have no cross-iteration data dependencies. • Rewrite those loops into a parfor or multiple spawn calls, following HPC compiler approaches (Allen & Kennedy, “Optimizing Compilers for Modern Architectures” & Steven Muchnick, Advanced Compiler Design and Implementation). • Demonstrate performance improvements on data-parallel benchmarks.
How will you empirically measure success?
- Correctness Tests • Demonstrate small concurrent Bril programs (multiple spawns, parallel computations). • Check for correct behavior (threads run in parallel, join properly, no double-join errors). 2. Performance Benchmarks • Write or adapt micro-benchmarks that benefit from parallelism. • Measure how thread-escape analysis or loop parallelization improves performance (e.g., speedups vs. sequential code). 3. Qualitative Complexity • Assess how concurrency changes the compiler pipeline. • Describe the design trade-offs in concurrency IR instructions, analysis complexity, and any pitfalls encountered.
Backup! If I'm unable to get the stuff from up working convincingly or just un out of time :(
What will you do?
Implement a register allocation pass for Bril that simulates a real backend by mapping Bril’s unbounded virtual variables to a finite set of machine registers. As suggested, the approach will “fake” a limited register set by renaming variables to bounded names, with any variable that cannot be assigned a register being relegated to “stack” storage.
Target SSA-form Bril and implement classic allocation algorithms including: • Implement graph-coloring register allocation inspired by Chaitin’s approach. • Explore an SSA-based allocation method that simultaneously “exits” SSA while assigning registers.
How Will You Do It? 1. Implement Register Allocation Pass(es): • Graph Coloring Approach: • Build an interference graph based on live-range analysis. • Use a greedy graph-coloring heuristic (or Chaitin/Briggs-style approach) to assign registers from a limited set. • Spill variables that cannot be colored (i.e., assign them to stack memory). • SSA-based Register Allocation: • Implement an algorithm that “exits” SSA and performs register allocation simultaneously. • Use heuristics or an iterative coalescing technique to reduce the number of move instructions. 2. Integration and Testing: • Modify the Bril interpreter or a backend target (e.g., generating C or LLVM IR) to simulate the effect of finite registers. • Create a suite of test Bril programs to verify that the allocation pass produces correct results. • Compare the performance (e.g., instruction count, spill frequency) and code quality of different register allocation strategies.
How Will You Empirically Measure Success? 1. Correctness Testing: • Run a set of Bril programs through the register allocation pass and compare the execution results with the original unallocated version. • Verify that no variables are lost and that spilled variables are correctly managed. 2. Empirical Evaluation: • Measure the number of spilled variables versus those successfully assigned to registers across various benchmark programs. • Evaluate the performance impact in a simulated environment by comparing execution speed and code size. • Compare the graph-coloring and SSA-based approaches in terms of register usage efficiency and runtime overhead. 3. Qualitative Analysis: • Document design decisions and any trade-offs encountered. • Provide an experience report discussing the ease of integration, algorithmic challenges, and potential for reuse in backends for real machine code generation.
Awesome! The spawn/join threading extension actually sounds super interesting, and it seems like it should also be accomplishable: that is, if all you do is get the reference interpreter working and make an attempt at escape analysis, that would be plenty. Sounds super interesting to report on! I think it would be a good idea to have a plan to do that and conduct your empirical evaluation before trying for any stretch goals.
Modify Bril’s textual/JSON grammar to include spawn and join (and possibly parfor).
Before getting started, could I ask you to please think through what these instructions will consist of? As in, what are the arguments to these instructions, and a one-sentence description of how they will work? I would like to help think through them now, before you get into implementing anything.
The other big question that you'll need to decide before getting started: what state, if any, will threads share? Here is a recommendation:
- No local variables are shared. When you
spawn, you get a copy of the current call stack that is completely distinct from the original one. - Memory allocations (from Bril's memory extension) are shared. Data races on elements of these allocations, however, are undefined behavior.
Please also decide on this now before proceeding farther.
In brili.ts, create a runtime mechanism to spawn threads (e.g. Node.js worker threads or some simpler concurrency simulation).
If you want to extend the reference interpreter, you will need to use something that is available in Deno. Fortunately, it looks like it supports Web Workers.
Before getting started, could I ask you to please think through what these instructions will consist of? As in, what are the arguments to these instructions, and a one-sentence description of how they will work? I would like to help think through them now, before you get into implementing anything.
Spawn: The spawn instruction creates a new thread to execute the function identified by func with the provided arguments. It does so by copying the current call stack so that all local variables are private to the spawned thread. The instruction returns a unique thread ID (of type int).
tid: int = spawn func arg1 arg2 ... argN;
{
"op": "spawn",
"args": ["func", "arg1", "arg2", ..., "argN"],
"dest": "tid",
"type": "int"
}
Join: The join instruction blocks the current thread until the thread corresponding to tid completes its execution.
join tid;
{
"op": "join",
"args": ["tid"]
}
The other big question that you'll need to decide before getting started: what state, if any, will threads share? Here is a recommendation:
- No local variables are shared. When you
spawn, you get a copy of the current call stack that is completely distinct from the original one.- Memory allocations (from Bril's memory extension) are shared. Data races on elements of these allocations, however, are undefined behavior.
I like this approach. We'll keep heap memory shared while maintaining isolated local variables between threads. When spawning a thread, it'll get its own copy of the call stack to ensure complete separation of execution contexts. Shared memory allocations with undefined behavior for data races is a pragmatic choice that matches how many real-world threading models work.
Please also decide on this now before proceeding farther.
In brili.ts, create a runtime mechanism to spawn threads (e.g. Node.js worker threads or some simpler concurrency simulation).
If you want to extend the reference interpreter, you will need to use something that is available in Deno. Fortunately, it looks like it supports Web Workers.
Deno's Web Workers seem great for this. Each worker can maintain its own call stack while we implement mechanisms to share the heap memory between them.
Fantastic; this sounds great! And the definitions of the two new instructions both look good.
Here is one tiny suggestion for the language extension: for thread IDs, it would be possible to invent a new opaque type instead of making these ints. That is, the spawn instruction might look like this:
tid: thread = spawn func arg1 arg2 ... argN;
And that new thread type would be the argument to join as well. This would just be a slightly safer way to set things up so it's impossible to "forge" thread IDs with integer arithmetic.
Bril PR: https://github.com/sampsyo/bril/pull/416