Ryujinx
Ryujinx copied to clipboard
Implement HLE macro for DrawElementsIndirect
This implements an HLE macro for the DrawElementsIndirect NVN macro, improving performance on games that uses this type of draw. This is mostly extracted from #3306.
Affected games:
Nier Automata: The End of YoRHa Edition. Lower FIFO % and slightly better performance in some areas.
Before:
After:

Monster Hunter Rise. Lower FIFO % and better performance in some areas (hard to tell how much better without unlocking the frame rate).
Before:
After:

Marvel Ultimate Alliance 3. Unfortunately this one won't really be improved since it also uses conditional rendering with GPU written buffers, so it still needs to flush data for that which nullifies all benefits from this change. I'm including it here just to indicate that it needs to be tested to make sure its not worse.
Nintendo Switch Sports, but this one is not playable right now due to other issues.
Problems:
- Conditional rendering comparison currently does not flush buffers before reading values. Marvel Ultimate Alliance 3 needs this, but since the counter values are in the same buffer as the indirect data, it was being flushed with the indirect data before. Since this change makes it skip this flush, I had to make the counter reads trigger flush to avoid regressing this game.
- We have no way to know the index buffer size with indirect draws, since the sub-range that is used is part of the indirect data (
firstIndexandindexCount). This creates various problems and complications.- We don't know how much buffer data we need to update for the draw. Right now it just assumes a maximum size and updates all that.
- ~~Can't convert index buffers on Vulkan, so any games using indirect draws with quads won't work properly. This can be fixed by doing the conversion in compute while reading the range from the indirect data there too.~~ This is now supported on 711ce42d176948a6e93127033ea6bccc09b7cd31.
Other changes:
MultiDrawIndirectCountandDrawIndirectCountmethods were combined to avoid code duplication.- None of the indirect draw methods reads the CPU written indirect data anymore, and instead just always assume a fixed index buffer size. That's because if the count is written from GPU, then the CPU side data might contain garbage.
- The
MultiDraw*methods were renamed removing theMultiprefix. This matches the Vulkan API names and allows all draw methods to be grouped together when sorted alphabetically, which is pretty neat.
Testing is welcome.
Download the artifacts for this pull request:
- ryujinx-Release-1.1.0+2c94f4f-linux_x64
- ryujinx-Release-1.1.0+2c94f4f-osx_x64
- ryujinx-Release-1.1.0+2c94f4f-win_x64
Experimental GUI (Avalonia)
GUI-less (SDL2)
Only for Developers
- ava-ryujinx-Debug-1.1.0+2c94f4f-linux_x64
- ava-ryujinx-Debug-1.1.0+2c94f4f-osx_x64
- ava-ryujinx-Debug-1.1.0+2c94f4f-win_x64
- ryujinx-Debug-1.1.0+2c94f4f-linux_x64
- ryujinx-Debug-1.1.0+2c94f4f-osx_x64
- ryujinx-Debug-1.1.0+2c94f4f-win_x64
- sdl2-ryujinx-headless-Debug-1.1.0+2c94f4f-linux_x64
- sdl2-ryujinx-headless-Debug-1.1.0+2c94f4f-osx_x64
- sdl2-ryujinx-headless-Debug-1.1.0+2c94f4f-win_x64
Marvel seems to run fine, can't really see any difference but maybe a low end system might see something.
We got a report on Discord of a regression on Mario Golf Super Rush with the previous change. It was because the game uses indirect draw with quads, which is not supported on Vulkan, and we did not support the index buffer conversion in this case because it requires having access to the index count and first index on CPU, which for indirect draw we don't as its stored on the indirect buffer. So to support this case, I implemented index buffer conversion on compute. It works like this:
- First dispatch reads indirect data and determines the index buffer bounds, it then writes the new indirect data with the modified index count and first index. The primitive count is written into a buffer to be used as parameter for the second dispatch, which is indirect.
- Second dispatch is indirect, X is written in the step above, Y and Z are always 1. It has one invocation per primitive, so each invocation will copy the indices for that primitive following the pattern specified for the topology conversion.
This also introduces "dependencies" on the CacheByRange. The index buffer being valid "depends" on the indirect buffer not being modified. So if the indirect buffer is modified, the cached index buffer has to be removed too. And for multi-draw, the indirect data being valid depends on the draw count not being modified, since it is used to determine the bounds of the indirect buffer. Now each entry on CacheByRange also stores a dependency list. When the entry is disposed, it also forces the removal of all the dependencies from their caches, which should cause them to be disposed too and cascade.
Testing is welcome.
The assumed buffer range is not large enough for some games, so I will just wait for #3775 and try updating the whole buffer then.
This is ready for review again. For the index buffer size problem, it not uses the contiguous mapped size starting at the index buffer address as the size, but it still has a maximum size that it restricts to. That is to avoid the buffer/region being checked being too large.
Needs to be tested on games that uses indirect draws to ensure that they did not regress (rendering is still correct etc):
- Nier Automata.
- Mario Golf: Super Rush.
- Monster Hunter Rise.
- Ghosts 'n Goblins: Resurrection.
- Bayonetta 3.
- Marvel Ultimate Alliance 3.
- Kirby and the Forgotten Land.
In addition to that, we need to make sure performance on those games did not regress compared to master:
- Monster Hunter Rise.
- Ghosts 'n Goblins: Resurrection.
- Bayonetta 3.
Found a consistent crash on bayonetta 3 that isn't present on master i believe
00:01:01.086 |S| HLE.OsThread.16 ServiceAm SetIdleTimeDetectionExtension: Stubbed. {idleTimeDetectionExtension: 0} 00:01:02.349 |E| HLE.OsThread.8 Application : Unhandled exception caught: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> LibHac.Common.HorizonResultException: ResultFsNonRealDataVerificationFailed (2002-4604): Hash error! at LibHac.Common.ThrowHelper.ThrowResult(Result result, String message) at LibHac.Tools.FsSystem.IntegrityVerificationStorage.ReadImpl(Int64 offset, Span
1 destination, IntegrityCheckLevel integrityCheckLevel) at LibHac.Tools.FsSystem.CachedStorage.ReadBlock(CacheBlock block, Int64 index) at LibHac.Tools.FsSystem.CachedStorage.GetBlock(Int64 blockIndex) at LibHac.Tools.FsSystem.CachedStorage.Read(Int64 offset, Span1 destination) at LibHac.Tools.FsSystem.StorageStream.Read(Byte[] buffer, Int32 offset, Int32 count) at System.IO.Stream.Read(Span1 ) at LibHac.Tools.FsSystem.StreamStorage.Read(Int64 offset, Span1 destination) at LibHac.FsSrv.Impl.StorageInterfaceAdapter.Read(Int64 offset, OutBuffer destination, Int64 size) at Ryujinx.HLE.HOS.Services.Fs.FileSystemProxy.IStorage.Read(ServiceCtx context) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object , Span`1& , Signature , Boolean , Boolean ) at System.Reflection.RuntimeMethodInfo.Invoke(Object , BindingFlags , Binder , Object[] , CultureInfo ) at Ryujinx.HLE.HOS.Services.IpcService.CallHipcMethod(ServiceCtx context) at Ryujinx.HLE.HOS.Services.ServerBase.Process(Int32 serverSessionHandle, UInt64 recvListAddr) at Ryujinx.HLE.HOS.Services.ServerBase.ServerLoop() at Ryujinx.HLE.HOS.Services.ServerBase.Main() at Ryujinx.HLE.HOS.Kernel.Threading.KThread.ThreadStart() at System.Threading.Thread.StartCallback()
Ryujinx_1.1.0+03843f9_2022-11-02_01-40-54.log
Savefile to replicate (keep running forward, after a cutscene an enemy will start aiming and the crash should happen almost instantly) 0.zip
Aside this the game renders normally and seems to stutter less as far as i can tell
Found a consistent crash on bayonetta 3 that isn't present on master i believe
00:01:01.086 |S| HLE.OsThread.16 ServiceAm SetIdleTimeDetectionExtension: Stubbed. {idleTimeDetectionExtension: 0} 00:01:02.349 |E| HLE.OsThread.8 Application : Unhandled exception caught: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> LibHac.Common.HorizonResultException: ResultFsNonRealDataVerificationFailed (2002-4604): Hash error! at LibHac.Common.ThrowHelper.ThrowResult(Result result, String message) at LibHac.Tools.FsSystem.IntegrityVerificationStorage.ReadImpl(Int64 offset, Span
1 destination, IntegrityCheckLevel integrityCheckLevel) at LibHac.Tools.FsSystem.CachedStorage.ReadBlock(CacheBlock block, Int64 index) at LibHac.Tools.FsSystem.CachedStorage.GetBlock(Int64 blockIndex) at LibHac.Tools.FsSystem.CachedStorage.Read(Int64 offset, Span1 destination) at LibHac.Tools.FsSystem.StorageStream.Read(Byte[] buffer, Int32 offset, Int32 count) at System.IO.Stream.Read(Span1 ) at LibHac.Tools.FsSystem.StreamStorage.Read(Int64 offset, Span1 destination) at LibHac.FsSrv.Impl.StorageInterfaceAdapter.Read(Int64 offset, OutBuffer destination, Int64 size) at Ryujinx.HLE.HOS.Services.Fs.FileSystemProxy.IStorage.Read(ServiceCtx context) --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object , Span`1& , Signature , Boolean , Boolean ) at System.Reflection.RuntimeMethodInfo.Invoke(Object , BindingFlags , Binder , Object[] , CultureInfo ) at Ryujinx.HLE.HOS.Services.IpcService.CallHipcMethod(ServiceCtx context) at Ryujinx.HLE.HOS.Services.ServerBase.Process(Int32 serverSessionHandle, UInt64 recvListAddr) at Ryujinx.HLE.HOS.Services.ServerBase.ServerLoop() at Ryujinx.HLE.HOS.Services.ServerBase.Main() at Ryujinx.HLE.HOS.Kernel.Threading.KThread.ThreadStart() at System.Threading.Thread.StartCallback()Ryujinx_1.1.0+03843f9_2022-11-02_01-40-54.log
Savefile to replicate (keep running forward, after a cutscene an enemy will start aiming and the crash should happen almost instantly) 0.zip
Aside this the game renders normally and seems to stutter less as far as i can tell
No crashes here, I used your save file
Performance regression in Mario Golf compared to Master
PR:

Master:

Nier Automata, Marvel Ultimate Alliance 3, Bayonetta 3 seems to look fine on my side.
Slight FPS improvement in Monster Hunter Rise
PR:
Master:

There might be a slight performance regression in Ghosts 'n Goblins: Resurrection but its hard to tell since the games performance is in general poor on Vulkan.
1 to 2 FPS compared to Master, from 42 to 40
@S0gnat0re your crash is due to a broken game file, unrelated to this PR.
Bayonetta 3 performance seems fine so far :
Master :
Average framerate : 59.1 FPS
Minimum framerate : 56.2 FPS
Maximum framerate : 62.7 FPS
1% low framerate : 43.0 FPS
0.1% low framerate : 38.1 FPS
PR :
Average framerate : 59.2 FPS
Minimum framerate : 56.1 FPS
Maximum framerate : 63.4 FPS
1% low framerate : 43.6 FPS
0.1% low framerate : 35.7 FPS
Causes a regression in Kirby and the Forgotten Land. Objects in the distance start to disappear way too early.
Master: https://i.gyazo.com/5645bee40881723355a725042a716572.mp4
PR: https://i.gyazo.com/2f0631c1d1a9171135fede3eee0266d5.mp4
This is ready for one last round of testing. I believe that all reported issues should be fixed now. Approach has been changed to assume index count and first index are written from CPU. The other approach of using the entire mapped range had high performance impact on some games. I plan to add a "Enable HLE macros" option in the UI, which is not done yet, but it shouldn't affect testing.
Kirby regression are fixed now :) Good Job :)
Is there a way to implement this into the ldn version of ryujinx?
I added more HLE macros due to what was noted on #3847. It can further improve performance by a little bit in some cases, but most games probably won't have visible improvements. Regardless, it's worth re-testing to make sure they are still working.
A new Enable Macro HLE option was added to the settings on GTK and Avalonia.
Regressions related to Ghost and Goblins on OpenGL and Mario Golf have been resolved. Wanted to report it back here, also nothing out of ordinary was found while testing with other games.
Hi, what I have to do to run this version of Ryujinx on Linux? It says that didn't found the prod.keys, but there's no folder to paste it, so Im very confused :(
