dolphin icon indicating copy to clipboard operation
dolphin copied to clipboard

Core: Create fastmem mappings for page address translation

Open JosJuice opened this issue 6 months ago • 9 comments

Previously we've only been setting up fastmem mappings for block address translation, but now we also do it for page address translation. This increases performance when games access memory using page tables, but decreases performance when games set up page tables.

The tlbie instruction is used as an indication that the mappings need to be updated.

There is the accuracy downside that the TLB is now effectively infinitely large. No games are known to be affected by this, and you still get the old, more accurate behavior if Enable Write-Back Cache is on.

Left to do:

  • [ ] Support for host page sizes larger than 4K
  • [ ] macOS is untested
  • [x] Rogue Squadron 3 is super slow due to the pessimistic setting of R and C bits (I guess the game is heavily swapping? DoJit is running very often)
  • [ ] Savestate migration?

JosJuice avatar Jun 21 '25 09:06 JosJuice

This greatly improves the performance of RS3 on my computer, to the point where even Hoth is running full speed. There are still some hiccups when it does a lot of mappings at once and I need to test more stages to be sure, but there is a definite performance boost.

JMC47 avatar Aug 03 '25 19:08 JMC47

Would one of you be able to build me an APK of this PR? I'm interested in seeing if this can help offset the performance hit using Manual Texture Sampling on the recent Adreno Chipsets.

stlouiscpht1 avatar Sep 20 '25 13:09 stlouiscpht1

An APK is already available from the buildbot. Click on the "pr-android" build check.

Just keep in mind that most games don't use page tables and therefore see no benefit from this.

JosJuice avatar Sep 20 '25 13:09 JosJuice

An APK is already available from the buildbot. Click on the "pr-android" build check.

Just keep in mind that most games don't use page tables and therefore see no benefit from this.

Got it.

As this PR currently stands, it does help with the performance of both Rogue Squadron games in the following setup (Default GPU driver, manual texture sampling, efb copies set to textures only), but still isn't perfect. Gets bogged down a lot with mappings. Vulkan does perform better than OpenGL, but no real difference in the type of shaders used.

Where it really helped my Adreno gpu though is pairing it with Turnip 25.2.0 r10 and efb copies set to texture only. The speed improvements are immense - Hoth and the initial Yavin level on RS3 play at full speed and are smooth without any mapping hiccups that I could see. This setup obviously only works with Vulkan, but I never had to change from the default specialized shaders or enable mts.

Unfortunately, Rogue Squadron 2 didn't see much change with the Turnip driver and this PR, so I guess it doesn't use the page tables.

There is one other thing. I was curious to see how write-back cache affected things. My ROG Phone 8 did not like that idea one bit and crashed as soon as the game started. I checked and it also does that with the released versions of dolphin.

stlouiscpht1 avatar Sep 20 '25 14:09 stlouiscpht1

Unfortunately, Rogue Squadron 2 didn't see much change with the Turnip driver and this PR, so I guess it doesn't use the page tables.

It does use page tables. I'm guessing you're bottlenecked by something else in that game then.

JosJuice avatar Sep 20 '25 16:09 JosJuice

Can the sub-tasks be moved to a different PR?

For MacOS, unless someone posts a binary on the forum you'll probably only get feedback once this is merged into trunk.

nbohr1more avatar Oct 03 '25 23:10 nbohr1more

The subtasks are pretty small. The reason why I haven't worked on this recently is because I've been working on FMA accuracy instead. The bigger task is actually writing tests for the code I've added.

JosJuice avatar Oct 04 '25 14:10 JosJuice

Snapdragon 8s Gen 3 Store EFB copies to texture only ✅ Cull vertices on the CPU ✅ Vulkan ✅ Turnip 25.3.0 R10 ✅

Performance varies greatly between 25-50 fps depending on the number of effects on the screen. There has been a significant improvement.

Screenshot_2025-10-09-09-27-30-441_org dolphinemu dolphinemu debug

Cristobal15 avatar Oct 09 '25 17:10 Cristobal15

Same performance in RS 3 on my ROG Phone 8 with efb textures only and Turnip 25.2.0 r10 forced GMEM. Performance is a little less with newer turnip drivers, especially the last few that don't have forced GMEM versions, likely due to having to turn off xfb textures only to clear up graphic artifacts. But still better than the released versions. No change noticed with or without cull vertices on the cpu enabled - probably more of a help of the lower lever 8s Gen 3 than on the full 8 Gen 3.

Rogue Squadron 2 is much improved though. It seems the earlier bottleneck i encountered was caused by having Vulkan background multi threading enabled by default. Turning it off made a huge difference.

stlouiscpht1 avatar Oct 18 '25 18:10 stlouiscpht1