Refactor PPU for performance improvements
This commit introduces two main performance enhancements to the PPU:
-
Eager DMG Frame Buffer Calculation: The
Ppu::frame_buffer()method previously used lazy evaluation for DMG mode, calculating the entire frame buffer from the shade_buffer on the first request. This commit changesrender_map_dmgto populateself.frame_bufferdirectly during scanline rendering, similar to CGB mode. This distributes the computation cost and makesPpu::frame_buffer()a consistently fast operation for both modes. -
Optimize
fill_frame_bufferwith Pattern Copy: ThePpu::fill_frame_buffermethod, used for clearing the screen or filling it with a specific color, has been optimized. For the main loop that populatesself.frame_bufferwith a uniform color, the implementation now usesstd::ptr::copy_nonoverlappingwith a pre-filled repeating RGB pattern when thesimdfeature is enabled. This leverages potential compiler optimizations for bulk memory copies, which can be significantly faster than scalar iteration for this task. The original scalar loop is retained if thesimdfeature is not active.
These changes aim to reduce potential stutter in DMG mode and accelerate screen fill operations, contributing to overall emulator performance.
Summary by CodeRabbit
-
Performance Improvements
- Enhanced frame buffer filling with SIMD optimization for faster rendering when supported.
-
Bug Fixes
- Improved accuracy of color rendering in DMG mode by updating the frame buffer immediately with the correct palette colors.