BizHawk
BizHawk copied to clipboard
Opt/run loop optimizations
This changeset includes some (somewhat experimental) performance optimizations for the main run loop. Optimization was guided with instrumentation and measured with displayed FPS while unthrottled and with Rewind disabled. I also optimized a few passes to reduce heap allocations and converted some of the most frequently allocated types to structs (e.g. ReturnData). I attempted to isolate changes to code on the hot path. For example, ReturnDataStruc could and probably should replace ReturnData universally, but wouldn't have a significant impact on performance.
Each commit was evaluated separately, and for the most part, each commit may be cherry-picked and applied independently.
I verified the changes don't break gameplay on MelonDS, but did not test with other cores. The most significant change (and most significant gain) was alling wbx_activate_host and wbx_deactivate_host only once, which alone represents about a 30% gain. FilterProgram is cached, based on the parameters used during creation, although this likely needs to be refined to include additional global configuration settings.
Check if completed:
- [x] I have run any relevant test suites
- [x] I, the committer, have read the licensing terms for contributors (last updated 2022-07-15) and am compliant
While some of these changes (like converting heap allocations to stack allocation and thereby reducing GC load) seem like senseful changes we probably want, I cannot seem to reproduce your mentioned 30% gain in fps. Were you testing in release or debug config?
When checking on a normal release build, unthrottled speed, rewind disabled, in pokemon platin (after pressing start on the main menu) on the melonds core, I get around 400fps on both and master and your branch (it does seem to be a bit faster on yours though, maybe 5%).
The ~33% gain was with the Release build, running without debugging or instrumentation. Here are the actual numbers I was seeing with HeartGold. I haven't seen anything in the neighborhood of 400 fps. The framerate that I experience seems to match the streamer I follow.
Debug fps (before): 27 fps Debug fps (after): 109 fps
Release fps (before): 90 fps Release fps (after): 127 fps
Specifically, I loaded the same savestate each time, which is in the lab before selecting a mon. On Release build, framerate jumped as high as 170fps but averaged around 127fps over about 10 seconds. With instrumentation enabled, the wbx_activate_host and wbx_deactivate_host accounted for about 37% of the CPU time within the run loop, with the other enhancements contributing about another 14% reduction.
It would probably be interesting to get data from other people, or alternatively figure out why you're seeing such a high cost of wbx_activate_host
, because this is how it looks for me:
The ReturnData change should be fine, and I see no reason not to use it both here and for every waterbox call that uses ReturnData.
Please remove the waterbox host enter/exit changes and put them in a separate PR. There's a lot of complexity there that I'll need some time to think about and discuss it.
Reverted and reapplied all changes except to wbx_activate_host. Made changes to ReturnData for all function calls.
Can someone send me a link with this pr implemented? As in, one that's compiled already? I wanted to test ares
Sorry for the late response, but there is a boost in the ares core. On build b8444f8b it has 41fps while dropping down to 38 on transition with buck bumble. On this pr, it boosts to 45fps and never goes below 41 on the first screen.
Have not did a comparison on gameplay part yet. But I'm sure its there too.
Very nice!