rpcs3
rpcs3 copied to clipboard
[TESTERS NEEDED AGAIN] SPU: PUTLLC16 Optimization, SPU Analyzer capabilities upgrade
For a while, I had a few complex SPU optimizations in mind. One of which was the "PUTLLC16" loop optimization (see #8703) The concept itself was great, detect atomic loops in SPU code which only update 16 bytes of data at maximum in order to bridge between atomic operation capacity of X86 and ARM which 16 bytes between the CELLBE's SPU architecture's capacity which is a whopping 128 bytes. So in theory, if we can analyse the code to detect when it is possible the atomic loop to update 16 bytes only (about a third of all SPU atomic loops in games are coded this way), the performance of that code would increase dramatically (especially on non-TSX CPUs for which the implementation is slower compared to TSX). But, as I started implementing analysis for detection of this pattern across a variaty of code from games, things started to entangle and many hacks were put in the original pull request in order to support as many code variations as possible for different code flows (mainly for single backward loops and single forward if inside tge atomic update). But, this is both hacky and less valueable than equiping the SPU analyzer with cross-block analysis, allowing more optimizations deriving from it in the future and detection of all possible 16-byte atomic loops cases, But this was no simple task, as the underline algorighm was difficult as hell to resolve it took me a whole year to do it. It was worth it though.
Please test performance of games, the difference would probably not be huge but noticeable in titles that have gaps betwseen TSX and non-TSX CPUs.
Significant performance improvements have been noted in Red Dead Redemption, Spider-Man Web Of Shadows, Metal Gear Solid 4 and Metal Gear Solid Online. Do note that changes are CPU subjective.
What to expect and test:
- SPU usage differences.
- Performance differences.
- Game compatibility and stability breakage.
Example of a simple SPU atomic loop with only 16 bytes of the reservation modified (notice how both STQR and LQR address the same offset and no other store/load types are used):
Also, please put the spu stuff in another PR than all the progress stuff.
Slightly worse performance for me in MGS4.
12700K @ Stock /w AVX-512
PR
Master
No real discernible difference in MGS4
9900K @ 4.80Ghz
PR /w TSX Enabled
PR /w TSX Disabled
Master /w TSX Enabled
Master /w TSX Disabled
Can someone retest this? I pushed many changes.
I could give it a try, do you recommend some games ?
Metal Gear Online hangs on building the SPU cache. I'm also forced to close RPCS3 via the task manager.
Log contains a bunch of these:
F {SPU Worker 7} SIG: Thread terminated due to fatal error: Verification failed
(in file F:\rpcs3\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:6968[:7], in function evaluate_start_state)
as well as
S SPU: PUTLLC16 Pattern Detected! (put_pc=0x6814, is_pc_rel=0, offset=0x0, is_const=0, Gd63vsaR9xJkYQH5C22uKCF6tXbR) (putllc0=0, putllc16+0=37, all=38)
@cipherxof dont use SPU Mega or Giga mode currently when testing this pull request.
@cipherxof dont use SPU Mega or Giga mode currently when testing this pull request. True, a game I try run for half a seconde and crash
@cipherxof dont use SPU Mega or Giga mode currently when testing this pull request.
Safe hung up before as well, but after your recent commits Safe works fine.
Seeing a pretty decent performance increase with MGO.
Without PR (Safe)
Without PR (Giga)
PR (Safe)
At the risk of beating a dead horse, I feel like I should mention this again. MGO has unplayable performance on the master build. I'm talking 50fps in the same spot I posted in the above screenshots. We have to use this commit (https://github.com/RPCS3/rpcs3/pull/12030/commits/ef6e9bb42040999d4296a704c10c4b51e512c086) to achieve playable performance (with spu_accurate_dma disabled). I thought it might be worth mentioning in case it fits within the scope of this PR :D
RPCS3.log I have a crash in Sonic Unleashed after the first level
Top is PR, bottom is Master in MGS4
Final Fantasy XIII with Interpreter, the only way to make it boot, crashes with this log in PR
F {SPU Worker 1} SIG: Thread terminated due to fatal error: Verification failed
(in file D:\a\1\s\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:5278[:3], in function analyse)
while in Master it has no issues.
EDIT After the third try it booted without issues.
It started working on Giga actually. It's about 10 fps faster on average than on safe, and about 30 fps faster on average than without this PR.
MGO specifically seems to be very sensitive to PPU/SPU reservations. It doesn't really affect MGS4 in the same way.
I checked Red Dead Redemption too and it's 2-5 fps slower on average compared to Master. This is consistent, even after shaders compiled.
Updated build, retest performance on both safe and mega mode.
Metal Gear Online seems pretty much the same as yesterday's build.
PR (Latest/Giga)
PR (Previous/Giga)
Emulator froze when switching to Mega, and upon reboot I crashed at the title screen with:
·F 0:00:30.592655 {SPU[0x0000200] Thread (MGS4_URGENTCellSpursKernel0) [0x218e8]} SIG: Thread terminated due to fatal error: Unknown STOP code: 0x0 (op=0x0, Out_MBox=empty)
(in file F:\rpcs3\rpcs3\Emu\Cell\SPUThread.cpp:6302[:22], in function stop_and_signal)
Upon a 3rd reboot, I got further in the menus but ended up at a deadlock:
Same here with Red Dead Redemption, performance looks the same as yesterday but it also hangs many times before going to main menu with Mega SPU. It just shows a black screen. I have to close emulator and reopen it to reach main menu and ingame.
Please retest
Still the same performance (quite a bit lower compared to Master) and still the same black screen hangs before reaching the main menu in Red Dead Redemption.
I would say it hangs about 30% of times i run the game.
Same issue. Reverting 24fa77f04406a76cab0878942929ee355c8f7190 fixes it.
By the way, this affects both Safe and Mega. Giga seems to work fine though.
Implemented LQX/STQX based atomic loop detection, fixed some bugs. @bigol83 @cipherxof
I'm sorry to report that wih latest commit Red Dead Redemption still hangs before reaching main menu. Now it seems it happens even more frequently. I'm using Mega. No performance improvements compared to previous commits.
@bigol83 Hi, are you using RSX Atomic Fifo setting? Also can you upload log?
@bigol83 Hi, are you using RSX Atomic Fifo setting? Also can you upload log?
Yes, i'm using RSX Atomic Fifo.
Here is the log RPCS3.log.gz
Implemented LQX/STQX based atomic loop detection, fixed some bugs. @bigol83 @cipherxof
No more issues on my end. Booted a few times and didn't get any crashes or deadlocks.
Found the bug, also fixed the RSX slowdown imposed by this pr.
Also there is now additional optimization if you turn off SPU accurate reservations in config file.
RSX bottlenecked situations seem to be better.
Before
After
Here's where things get a bit more interesting. MGO now performs reasonably well without the previously mentioned SPU hack (https://github.com/RPCS3/rpcs3/commit/ef6e9bb42040999d4296a704c10c4b51e512c086) when Accurate SPU Reservations is disabled.
Note: These were taken using Mega SPU Block size, as Giga still exhibited severe performance issues without the hack.
Here's the same comparison using a fairly recent master build:
Before this PR
Here's my results for MGS4 on Windows:
Master SPU Reservations Off
Master SPU Reservations On
PR SPU Reservations Off
PR SPU Reservations On
Linux (saw no difference with SPU res on/off):
Master
PR
Red Dead Redemption doesn't hang anymore, tried lots of times, never happened, but performance for me is still lower than Master, using the same settings. I made a comparison in the spot where the performance is lowest and the difference is on average 2fps, but during the intro there are spots where the difference is around 6fps less in Pr compared to Master.
Master
PR
Also, SPU Reservations set to False doesn't improve performance for me
PR SPU Reservations True
PR SPU Reservations False
Hello, this is my first time testing for RPCS3 and using GitHub so please excuse my mistakes.
Specs: 5800X3D (undervolted) RTX 3080 10GB 32GB DDR4 3200MHZ
God of War III: All settings used are from the wiki page of GOW3 + Disable SPU MLAA/MLAA patches and upscaled to 1440p. The tests were made after going through the first fight once for shader compiling. Couldn't test more than the first part of the tutorial due to time constraints, however I haven't noticed any weird graphical errors/crashes (yet).
Master
PR
Please inform me on how to improve my testing and how to provide accurate and useful information.