rpcs3 icon indicating copy to clipboard operation
rpcs3 copied to clipboard

[TESTERS NEEDED AGAIN] SPU: PUTLLC16 Optimization, SPU Analyzer capabilities upgrade

Open elad335 opened this issue 1 year ago • 55 comments

For a while, I had a few complex SPU optimizations in mind. One of which was the "PUTLLC16" loop optimization (see #8703) The concept itself was great, detect atomic loops in SPU code which only update 16 bytes of data at maximum in order to bridge between atomic operation capacity of X86 and ARM which 16 bytes between the CELLBE's SPU architecture's capacity which is a whopping 128 bytes. So in theory, if we can analyse the code to detect when it is possible the atomic loop to update 16 bytes only (about a third of all SPU atomic loops in games are coded this way), the performance of that code would increase dramatically (especially on non-TSX CPUs for which the implementation is slower compared to TSX). But, as I started implementing analysis for detection of this pattern across a variaty of code from games, things started to entangle and many hacks were put in the original pull request in order to support as many code variations as possible for different code flows (mainly for single backward loops and single forward if inside tge atomic update). But, this is both hacky and less valueable than equiping the SPU analyzer with cross-block analysis, allowing more optimizations deriving from it in the future and detection of all possible 16-byte atomic loops cases, But this was no simple task, as the underline algorighm was difficult as hell to resolve it took me a whole year to do it. It was worth it though.

Please test performance of games, the difference would probably not be huge but noticeable in titles that have gaps betwseen TSX and non-TSX CPUs.

Significant performance improvements have been noted in Red Dead Redemption, Spider-Man Web Of Shadows, Metal Gear Solid 4 and Metal Gear Solid Online. Do note that changes are CPU subjective.

What to expect and test:

  • SPU usage differences.
  • Performance differences.
  • Game compatibility and stability breakage.

Example of a simple SPU atomic loop with only 16 bytes of the reservation modified (notice how both STQR and LQR address the same offset and no other store/load types are used): image

elad335 avatar Apr 11 '24 18:04 elad335

Also, please put the spu stuff in another PR than all the progress stuff.

Megamouse avatar Apr 11 '24 20:04 Megamouse

Slightly worse performance for me in MGS4.

12700K @ Stock /w AVX-512

PR

Screenshot from 2024-04-11 14-28-59

Screenshot from 2024-04-11 14-21-45

Master

Screenshot from 2024-04-11 14-24-44

Screenshot from 2024-04-11 14-23-42

cipherxof avatar Apr 11 '24 21:04 cipherxof

No real discernible difference in MGS4

9900K @ 4.80Ghz

PR /w TSX Enabled

image

image

PR /w TSX Disabled

image

image

Master /w TSX Enabled

image

image

Master /w TSX Disabled

image

image

Nishikoi avatar Apr 11 '24 23:04 Nishikoi

Can someone retest this? I pushed many changes.

elad335 avatar Apr 19 '24 17:04 elad335

I could give it a try, do you recommend some games ?

A5362 avatar Apr 19 '24 19:04 A5362

Metal Gear Online hangs on building the SPU cache. I'm also forced to close RPCS3 via the task manager.

Log contains a bunch of these:

F {SPU Worker 7} SIG: Thread terminated due to fatal error: Verification failed
(in file F:\rpcs3\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:6968[:7], in function evaluate_start_state)

as well as

S SPU: PUTLLC16 Pattern Detected! (put_pc=0x6814, is_pc_rel=0, offset=0x0, is_const=0, Gd63vsaR9xJkYQH5C22uKCF6tXbR) (putllc0=0, putllc16+0=37, all=38)

RPCS3.zip

cipherxof avatar Apr 19 '24 19:04 cipherxof

@cipherxof dont use SPU Mega or Giga mode currently when testing this pull request.

elad335 avatar Apr 19 '24 20:04 elad335

@cipherxof dont use SPU Mega or Giga mode currently when testing this pull request. True, a game I try run for half a seconde and crash

A5362 avatar Apr 19 '24 20:04 A5362

@cipherxof dont use SPU Mega or Giga mode currently when testing this pull request.

Safe hung up before as well, but after your recent commits Safe works fine.

Seeing a pretty decent performance increase with MGO.

Without PR (Safe)

main-safe

Without PR (Giga)

main-giga

PR (Safe)

pr-hack-safe

At the risk of beating a dead horse, I feel like I should mention this again. MGO has unplayable performance on the master build. I'm talking 50fps in the same spot I posted in the above screenshots. We have to use this commit (https://github.com/RPCS3/rpcs3/pull/12030/commits/ef6e9bb42040999d4296a704c10c4b51e512c086) to achieve playable performance (with spu_accurate_dma disabled). I thought it might be worth mentioning in case it fits within the scope of this PR :D

cipherxof avatar Apr 19 '24 21:04 cipherxof

RPCS3.log I have a crash in Sonic Unleashed after the first level

A5362 avatar Apr 19 '24 21:04 A5362

Top is PR, bottom is Master in MGS4

mgs4

Final Fantasy XIII with Interpreter, the only way to make it boot, crashes with this log in PR

F {SPU Worker 1} SIG: Thread terminated due to fatal error: Verification failed
(in file D:\a\1\s\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:5278[:3], in function analyse)

while in Master it has no issues.

EDIT After the third try it booted without issues.

bigol83 avatar Apr 19 '24 22:04 bigol83

It started working on Giga actually. It's about 10 fps faster on average than on safe, and about 30 fps faster on average than without this PR.

MGO specifically seems to be very sensitive to PPU/SPU reservations. It doesn't really affect MGS4 in the same way.

image

cipherxof avatar Apr 19 '24 22:04 cipherxof

I checked Red Dead Redemption too and it's 2-5 fps slower on average compared to Master. This is consistent, even after shaders compiled.

bigol83 avatar Apr 19 '24 23:04 bigol83

Updated build, retest performance on both safe and mega mode.

elad335 avatar Apr 20 '24 20:04 elad335

Metal Gear Online seems pretty much the same as yesterday's build.

PR (Latest/Giga)

image

PR (Previous/Giga)

image

Emulator froze when switching to Mega, and upon reboot I crashed at the title screen with:

·F 0:00:30.592655 {SPU[0x0000200] Thread (MGS4_URGENTCellSpursKernel0) [0x218e8]} SIG: Thread terminated due to fatal error: Unknown STOP code: 0x0 (op=0x0, Out_MBox=empty)
(in file F:\rpcs3\rpcs3\Emu\Cell\SPUThread.cpp:6302[:22], in function stop_and_signal)

RPCS3.log

Upon a 3rd reboot, I got further in the menus but ended up at a deadlock:

RPCS3.log

cipherxof avatar Apr 20 '24 21:04 cipherxof

Same here with Red Dead Redemption, performance looks the same as yesterday but it also hangs many times before going to main menu with Mega SPU. It just shows a black screen. I have to close emulator and reopen it to reach main menu and ingame.

bigol83 avatar Apr 20 '24 22:04 bigol83

Please retest

elad335 avatar Apr 21 '24 07:04 elad335

Still the same performance (quite a bit lower compared to Master) and still the same black screen hangs before reaching the main menu in Red Dead Redemption.

I would say it hangs about 30% of times i run the game.

bigol83 avatar Apr 21 '24 09:04 bigol83

Same issue. Reverting 24fa77f04406a76cab0878942929ee355c8f7190 fixes it.

By the way, this affects both Safe and Mega. Giga seems to work fine though.

cipherxof avatar Apr 21 '24 20:04 cipherxof

Implemented LQX/STQX based atomic loop detection, fixed some bugs. @bigol83 @cipherxof

elad335 avatar Apr 24 '24 07:04 elad335

I'm sorry to report that wih latest commit Red Dead Redemption still hangs before reaching main menu. Now it seems it happens even more frequently. I'm using Mega. No performance improvements compared to previous commits.

bigol83 avatar Apr 24 '24 12:04 bigol83

@bigol83 Hi, are you using RSX Atomic Fifo setting? Also can you upload log?

elad335 avatar Apr 24 '24 14:04 elad335

@bigol83 Hi, are you using RSX Atomic Fifo setting? Also can you upload log?

Yes, i'm using RSX Atomic Fifo.

Here is the log RPCS3.log.gz

bigol83 avatar Apr 24 '24 18:04 bigol83

Implemented LQX/STQX based atomic loop detection, fixed some bugs. @bigol83 @cipherxof

No more issues on my end. Booted a few times and didn't get any crashes or deadlocks.

cipherxof avatar Apr 24 '24 19:04 cipherxof

Found the bug, also fixed the RSX slowdown imposed by this pr.

elad335 avatar Apr 25 '24 19:04 elad335

Also there is now additional optimization if you turn off SPU accurate reservations in config file.

elad335 avatar Apr 25 '24 19:04 elad335

RSX bottlenecked situations seem to be better.

Before

image

After

oo

Here's where things get a bit more interesting. MGO now performs reasonably well without the previously mentioned SPU hack (https://github.com/RPCS3/rpcs3/commit/ef6e9bb42040999d4296a704c10c4b51e512c086) when Accurate SPU Reservations is disabled.

Note: These were taken using Mega SPU Block size, as Giga still exhibited severe performance issues without the hack.

image

Here's the same comparison using a fairly recent master build:

Before this PR

image

cipherxof avatar Apr 25 '24 21:04 cipherxof

Here's my results for MGS4 on Windows:

Master SPU Reservations Off

image

master-spuresoff2

Master SPU Reservations On

image

PR SPU Reservations Off

image

pr-spuresoff2

PR SPU Reservations On

pr-spureson

Linux (saw no difference with SPU res on/off):

Master

Screenshot from 2024-04-25 15-27-37

PR

pr-spureson

cipherxof avatar Apr 25 '24 22:04 cipherxof

Red Dead Redemption doesn't hang anymore, tried lots of times, never happened, but performance for me is still lower than Master, using the same settings. I made a comparison in the spot where the performance is lowest and the difference is on average 2fps, but during the intro there are spots where the difference is around 6fps less in Pr compared to Master.

Master master

PR pr

Also, SPU Reservations set to False doesn't improve performance for me

PR SPU Reservations True spu reservations on

PR SPU Reservations False spu reservations off

bigol83 avatar Apr 25 '24 22:04 bigol83

Hello, this is my first time testing for RPCS3 and using GitHub so please excuse my mistakes.

Specs: 5800X3D (undervolted) RTX 3080 10GB 32GB DDR4 3200MHZ

God of War III: All settings used are from the wiki page of GOW3 + Disable SPU MLAA/MLAA patches and upscaled to 1440p. The tests were made after going through the first fight once for shader compiling. Couldn't test more than the first part of the tutorial due to time constraints, however I haven't noticed any weird graphical errors/crashes (yet).

Master image

PR image

Please inform me on how to improve my testing and how to provide accurate and useful information.

aikhalaf avatar Apr 26 '24 17:04 aikhalaf