crab
crab copied to clipboard
Used bit packing for evaluating ARM condition codes instead of a switch-case
This PR gains a couple FPS depending on the scene in emerald in my VM, tell me if you get any benefit from it
The truthfulness of an ARM condition depends on 2 factors:
- The condition code (4 bits)
- The upper 4 CPSR bits This means that you can use a 256-entry truth table that uses the upper 8 bits as a hash, instead of using a switch-case which would probably compile to an array lookup + indirect jump.
LUTs aren't generally the best thing for the cache and stuff, so they shouldn't be abused toooo much. So, here's a neat bit packing trick which originates from MelonDS's ARM interpreter, which uses a packed 32 (16*2) byte LUT of masks depending on the condition code, instead of a switch-case, to verify if a condition is true. The 16 masks in the LUT are magic numbers which get masked by (1 << CPSR_FLAGS). The masks are specially-made so that masks [conditionCode] & (1 << CPSR_FLAGS)
will always return a non-zero value if the condition is met, and 0 if not. This way, you can
- Minimize the dcache overhead of a 256-byte truth table by tightly packing it (32 bytes are fewer than 256 :p)
- Not use a switch-case
I used Pokemon Emerald to make sure it works and arm.gba wihch still passes. I tried ARMWrestler too but I couldn't find the start button. It boots though. ~~Tell me what you think when you can~~
Hey thanks so much for submitting this! This is a tricky little change that I don't think I would have thought of haha. However, I just pulled down the changes locally and compared to the current FPS I'm seeing, and I wasn't actually able to see any improvement in Emerald. In fact, I'm seeing ~2 FPS lower on Golden Sun on average across a few runs. I don't really understand why it would be slower for me, since logically it seems like it should just be an improvement. You were able to see an FPS gain though?
Hey thanks so much for submitting this! This is a tricky little change that I don't think I would have thought of haha. However, I just pulled down the changes locally and compared to the current FPS I'm seeing, and I wasn't actually able to see any improvement in Emerald. In fact, I'm seeing ~2 FPS lower on Golden Sun on average across a few runs. I don't really understand why it would be slower for me, since logically it seems like it should just be an improvement. You were able to see an FPS gain though?
Yeah though nothing too groundbreaking. Oh well :(
Somewhat related to this issue, but I think an easy / free optimisation to implement is to check if the cond is AL (0xE), if so, continue, otherwise, use the LUT (or switch). In the vast majority or cases, the cond is going to be AL, so the switch / LUT won't be hit.
If crystal supports marking stuff likely/unlikely, you can label that if(cond==AL) as likely.
@ITotalJustice Thanks for the idea! Tested in 8d9c789, although I didn't see any noticeable improvement in the few games I tested