rv32emu icon indicating copy to clipboard operation
rv32emu copied to clipboard

Evaluate effectiveness of dynamic superinstructions

Open jserv opened this issue 1 year ago • 2 comments

Superinstructions are well-known techniques for improving the performance of interpreters. Superinstructions eliminate jumps between VM operations (interpreter dispatch) and enable more optimizations in the merged code. Adopting an approach to work with dynamic superinstructions in a RISC-V emulator offers an intriguing blend of traditional JIT compilation and interpreter-based execution, and profiler can recommend superinstructions. This strategy could capitalize on the strengths of both methodologies. Let's evaluate the effectiveness of this approach based on several key factors:

  • By having the JIT compiler emit common sequences as superinstructions, the execution speed can be further enhanced, as these superinstructions reduce the overhead of interpreting multiple individual instructions.
  • The ability to transform larger sequences, including loops, into a single superinstruction can lead to substantial performance gains, especially for repetitive or compute-intensive tasks.
  • Dynamic code generation and execution, a staple of JIT compilation, can introduce security vulnerabilities, such as just-in-time spraying attacks.

However, this approach also introduces additional complexity and requires careful consideration of resource utilization and security implications. The effectiveness of this approach largely depends on the specific requirements and constraints of the emulation environment, as well as the ability to effectively balance the trade-offs involved.

Reference:

jserv avatar Nov 29 '23 09:11 jserv

Regarding enabling offline macro-op fusion, in the context of RISC-V, which is a fixed-width ISA, certain operations are split into multiple instructions. For hardware RISC-V implementations, these instruction pairs are automatically combined at runtime by the CPU as code runs. However, we can perform this fusion offline, simplifying the process. This approach offers several benefits, including improved tool processing and easier recompilation by the instruction set simulator. Importantly, it is reversible, allowing us to revert to normal RISC-V code if needed. Essentially, it provides an alternative, simpler way to express the same functionality. With the help of dynamic superinstruction, offline macro-op fusion can be achieved in a more concrete way.

jserv avatar Dec 01 '23 03:12 jserv

Reference: Binary Translation Using Peephole Superoptimizers

When compared to the native compiler, our translated code achieves median performance of 67% on large benchmarks and in some small stress tests actually outperforms the native compiler.

Performance comparison

Performance comparison of the proposed translator (peep) with open source binary translator Qemu (qemu), and a commercial binary translator Apple Rosetta (rosetta). The bars represent performance relative to a natively compiled executable (higher is better). Missing bars are due to failed translations.

Souper is a superoptimizer for LLVM IR. It uses an SMT solver to help identify missing peephole optimizations in LLVM's midend optimizers.

BOLT is a post-link optimizer developed to speed up large applications. It achieves the improvements by optimizing application's code layout based on execution profile gathered by sampling profiler, such as Linux perf tool.

jserv avatar Dec 30 '23 15:12 jserv