Refactor PPU for performance improvements

Open joamag opened this issue 9 months ago • 1 comments

This commit introduces two main performance enhancements to the PPU:

Eager DMG Frame Buffer Calculation: The Ppu::frame_buffer() method previously used lazy evaluation for DMG mode, calculating the entire frame buffer from the shade_buffer on the first request. This commit changes render_map_dmg to populate self.frame_buffer directly during scanline rendering, similar to CGB mode. This distributes the computation cost and makes Ppu::frame_buffer() a consistently fast operation for both modes.
Optimize fill_frame_buffer with Pattern Copy: The Ppu::fill_frame_buffer method, used for clearing the screen or filling it with a specific color, has been optimized. For the main loop that populates self.frame_buffer with a uniform color, the implementation now uses std::ptr::copy_nonoverlapping with a pre-filled repeating RGB pattern when the simd feature is enabled. This leverages potential compiler optimizations for bulk memory copies, which can be significantly faster than scalar iteration for this task. The original scalar loop is retained if the simd feature is not active.

These changes aim to reduce potential stutter in DMG mode and accelerate screen fill operations, contributing to overall emulator performance.

Summary by CodeRabbit

Performance Improvements
- Enhanced frame buffer filling with SIMD optimization for faster rendering when supported.
Bug Fixes
- Improved accuracy of color rendering in DMG mode by updating the frame buffer immediately with the correct palette colors.

May 25 '25 19:05 joamag