Ryujinx icon indicating copy to clipboard operation
Ryujinx copied to clipboard

GPU: Don't trigger uploads for redundant buffer updates

Open riperiperi opened this issue 3 years ago • 7 comments

Benefits:

  • Reduces number of buffer uploads greatly in certain games (4096 byte cost each time)
  • Avoids splitting render pass on Vulkan (cost depends on GPU driver)
  • Batches adjacent ConstantBufferUpdater (CBU) writes into one access, reduces number of memory manager calls.

Downsides:

  • Check/triggers read tracking for upload regions (slightly slower)
  • Can be slow in software memory if range is non-contigous (forces redundant update anyways)
    • The redundancy check method says that it can return true if checking would be too slow, and that's what it does in this case. Allocating and filling an array with the non-contiguous regions would be a lot slower than just saying the write happened.
  • Upload must be flushed by potential users (could be buggy if a case is missed)

This has a lot of potential to break things since it moves the actual data upload to the flush method. This is to avoid performing the redundancy check per-integer for certain uploads. Hopefully we should flush before everything that can reasonably read CBU output.

Improves performance in Xenoblade Chronicles: Definitive Edition (Vulkan, note there is another issue here) and Link's Awakening. May very slightly reduce performance in games that do CBU that is not redundant, and are pegged at 99% FIFO.

This one is dangerous, so check with as many games and different engines as you can.

riperiperi avatar Nov 09 '22 23:11 riperiperi

Tested around 15 different games using different engines. No regressions found. Nice work!

LukeWarnut avatar Nov 10 '22 06:11 LukeWarnut

Before: image

After: image

Note: 76 fps on opengl. The remaining slowdown is due to the game creating an excessive amount of sync when drawing characters.

riperiperi avatar Nov 10 '22 10:11 riperiperi

Tested about 8 games and played each for a bit to make sure nothing broke, seems perfectly fine so far.

Bjorn29512 avatar Nov 10 '22 15:11 Bjorn29512

Tested the following games with no noticeable regressions. Didn't observe any noticeable performance differences either (though all games except Hyrule Warriors ran at full speed on both branches). Windows 11, Ryzen 3800, Nvidia 1660 Super, Vulkan

  • Torchlight 3
  • Tricky Towers
  • Cadance of Hyrule
  • Xenoblade 2
  • A Short Hike
  • Deltarune
  • Lumines Remastered
  • Dragon Ball FighterZ
  • Valkyria Chronicles 4
  • Dead or Alive Xtreme 3 Scarlet
  • Alien Isolation
  • Shantae and the Seven Sirens
  • Sonic Mania
  • Metroid Dread
  • Hyrule Warriors
  • Picross S4
  • Kirby and the Forgotten Land
  • Untitled Goose Game

lostromb avatar Nov 12 '22 04:11 lostromb

I don't have things set up right now for more accurate perf numbers, but anecdotally this appears to speed up Smash and Pokemon Sword by about 4% (again, with the same config as in the previous comment)

lostromb avatar Nov 12 '22 06:11 lostromb

hi lostromb, i'm having the same specs like you except with i5-9500F, does this PR also significantly improve Bayonetta 3 performance?

Jayzee2008 avatar Nov 19 '22 03:11 Jayzee2008

I don't have Bayonetta 3 unfortunately. But I did want to revisit this change with detailed numbers for other games just to check my assumptions. And yeah..... there's not much change here, positive or negative. So obviously the perf difference depends on how the game engine behaves.

image image

lostromb avatar Nov 23 '22 22:11 lostromb