rv32emu
rv32emu copied to clipboard
jit: Background compilation thread
In tiered compilation, the interpreter monitors the frequency of jumps to specific code addresses. Once a pre-set threshold is reached, it initiates JIT compilation. Concurrently, the emulator can continue operating with the interpreter, while JIT compilation is processed in a background thread, seamlessly switching to the JIT-compiled code once it is ready.
This emulator utilizes a JIT compiler for executing all RISC-V instructions. When execution begins, each emulator thread is equipped to compile new code. This implementation allows JIT compilation to occur in a background thread while the interpreter continues its execution until compilation is complete. If the emulator encounters a function that hasn't been compiled yet, it acquires a lock on the JIT code backend and attempts to compile the entire function into the JIT backend before resuming execution. Importantly, this lock only restricts other threads from adding new code to the JIT backend during compilation, without impeding their ability to use the JIT backend. Essentially, this means that one thread compiling new code has a minimal impact on other threads, resulting in a lock that imposes little overhead.
I implemented T2C with a background thread and a compilation wait queue. Once the target block exceeds the condition for launching T2C, it is added to the compilation wait queue. The background thread continuously checks the compilation wait queue and translates the target block into LLVM IR and offloads LLVM IR to LLVM backend.
Unlike the original design, this background thread delays the timing of invoking the T2C-generated machine code, leading to more frequent invocation of T1C-generated machine code. However, this design saves overhead in runtime compilation time because we do not need to wait for T2C in the main thread. Therefore, the performance of T1C is more influential in this design. I believe the performance will improve further after PR #384 .
Unlike the original design, this background thread delays the timing of invoking the T2C-generated machine code, leading to more frequent invocation of T1C-generated machine code. However, this design saves overhead in runtime compilation time because we do not need to wait for T2C in the main thread. Therefore, the performance of T1C is more influential in this design. I believe the performance will improve further after PR #384 .
For benchmark metrics notation, use "rv32emu-tiered" to denote the tiered JIT compilation approach within rv32emu.