rpcs3 icon indicating copy to clipboard operation
rpcs3 copied to clipboard

[WIP][TESTERS NEEDED] SPU LLVM: PUTLLC 16 Optimization

Open elad335 opened this issue 5 years ago • 29 comments

I've noticed that there are many SPU atomic ops which do not fully use the entire 128-byte memory provided for them, some only use a single 16-byte block for it. Usually about %20 to %35 from all of atomic ops only use one 16-byte block. This brings up a thought: what if instead of executing full 128-byte atomic PUTLLC we can replace it with a 16-byte version of it which allows to optimize it with cmpxchg16b. The way it detects it is by analyzing all stores/loads between PUTLLC and GETLLAR. If all LS stores/loads are from the same type and use the same registers state as other stores/loads the pattern is detected. Do note that if only loads were detected in the loop even if not from the same type and may use different addresses, this counts as if no memory operation was executed at all in the atomic op. For safety all branch targets falling into GETLLAR<->PUTLLC range cancel out this optimization, although I haven't encountered such case yet. Fixes GOWA with RSX reservations off but only for this game, as the pattern is detected well for it.

For all cases for where this optimization is detected it is as if "Accurate RSX reservations" was enabled so test this setting as well for performance comparisons as well as TSX vs no-TSX.

Turnaround of events:

  • In order to complete this pull request properly, I implemented advacned cross-blocks SPU code analysis that retains pattern data across "future" and "past" code blocks. This is probably one of the most important SPU pull requests in years to come, laying the ground for many optimizations to come.

In honors: I want to pay my gratitudes to Iftah Marash, I could not have done this without him.

elad335 avatar Aug 07 '20 12:08 elad335

I tested 3 games so far following the 3 run step provided - TSX on, RSX setting doesnt matter TSX off, RSX reserv enabled TSX off, RSX resrv disabled

SPU GETLLAR polling detection was set to 'True' in all the games tested

Games

  • One Piece: Pirate Warriors - No change in performance
  • Resistance 2 - No change in performance
  • Killzone HD - With TSX off, RSX reserv enabled and TSX off, RSX resrv disabled the performance decreases 3-4 fps comparison with the TSX 'ON'. Same issue happens in the master version as well.

web1018 avatar Aug 08 '20 18:08 web1018

UnWIPed.

elad335 avatar Aug 11 '20 15:08 elad335

This PR on Linux causes RPCS3 to crash before creating the game window. Log doesn't seem to show much, but on the terminal I get *** stack smashing detected ***: terminated.

Seems to happens on all games, with or without cache. Using SPU Interpreters (Fast or Precise) avoids the crash, Recompilers crash. RPCS3.log

RainbowCookie32 avatar Aug 13 '20 01:08 RainbowCookie32

Needs testing.

elad335 avatar Jul 27 '23 10:07 elad335

Need for Speed Most Wanted doesn't seem to boot.

Master: NFS most wanted 1

PR: NFS most wanted RPCS3.log.gz

Ordinary205 avatar Jul 27 '23 12:07 Ordinary205

Enable SPU Debug in debug tab and re-upload log with the latest build.

elad335 avatar Jul 27 '23 16:07 elad335

Heres the re-upload log: RPCS3.log.gz

Ordinary205 avatar Jul 27 '23 16:07 Ordinary205

@Ordinary205 Sorry to request again, can you re-upload log with the latest build? I made a mistake dumping debug information.

elad335 avatar Jul 27 '23 17:07 elad335

It's all right RPCS3.log.gz

Ordinary205 avatar Jul 27 '23 18:07 Ordinary205

Still doesn't boot properly. Same error 1 RPCS3.log.gz

Ordinary205 avatar Jul 27 '23 19:07 Ordinary205

Pushed a fix, please retest.

elad335 avatar Jul 28 '23 11:07 elad335

Nothing changed. Not really sure what's going on with this issue. Need for speed most wanted RPCS3.log.gz

Heres a log of the official build incase if it's useful. RPCS3.log.gz

Ordinary205 avatar Jul 28 '23 12:07 Ordinary205

Spurs test doesn't complete putllc test I also have NFS MW in log RPCS3.log

Darkhost1999 avatar Jul 28 '23 12:07 Darkhost1999

Fixed spu test, retest.

elad335 avatar Jul 28 '23 12:07 elad335

Also pushed an experimental optimization.

elad335 avatar Jul 28 '23 12:07 elad335

@elad335 Sorry for the disappointment, but it's still not fixed. RPCS3.log.gz

Ordinary205 avatar Jul 28 '23 13:07 Ordinary205

Fixed Need For Speed bug. retest.

elad335 avatar Jul 29 '23 10:07 elad335

Need for Speed Most Wanted now boots properly. After RPCS3.log.gz

Ordinary205 avatar Jul 29 '23 10:07 Ordinary205

Tested my usual 5 game suite (Persona 5, DeS, GOW 3, GOW A, TLOU) with this PR.

Every game booted and played just fine, performance was within margin of error, except for Demon's Souls which gained 6 FPS compared to latest master.

Master - 104 FPS DeS master

This PR - 110 FPS DeS PR

solarmystic avatar Jul 29 '23 11:07 solarmystic

More patterns are detected now if you want to retest.

elad335 avatar Jul 29 '23 13:07 elad335

Heres a comparison for Midnight Club LA.

Master: 47.1/50/53.2 FPS Desktop 2023 07 29 - 19 53 06 08 - frame at 0m55s (2)

PR: 49/50.7/52 FPS Desktop 2023 07 29 - 19 49 37 07 - frame at 0m38s (2)

The master seems to vibrate, and the PR seems slightly steady.

Ordinary205 avatar Jul 29 '23 16:07 Ordinary205

With the latest pr commits, every game in my 5 game test bench now exhibits noticeable performance improvements, although God of War Ascension now has some flickering not present in current master.

Demon's Souls improves even further from 110 to 112 FPS average now.

image

Video demonstration of flickering (seizure warning):-

https://github.com/RPCS3/rpcs3/assets/4345150/05f8aa83-9d59-4ba4-96fa-8d9b835fabcd

Zipped SS of results putllc pr.zip

solarmystic avatar Jul 29 '23 16:07 solarmystic

Testing all games PS3 E SIG: Thread [Vulkan Device Enumeration Thread] is too sleepy. Waiting for it 31100.867µs already!

Joaozin-tech avatar Jul 30 '23 15:07 Joaozin-tech

Testing all games PS3 E SIG: Thread [Vulkan Device Enumeration Thread] is too sleepy. Waiting for it 31100.867µs already!

That message is generic. Do you have any context or the log file to assist in understanding what you're saying? Did games freeze while playing them that didn't on master branch?

Darkhost1999 avatar Jul 30 '23 18:07 Darkhost1999

That message is generic. Do you have any context or the log file to assist in understanding what you're saying? Did games freeze while playing them that didn't on master branch?

in general games tend to be slow with this error message from the Vulkan API I deleted everything here without logs I deleted everything back to the master branch even where it doesn't happen at any time friends thanks

Joaozin-tech avatar Jul 30 '23 20:07 Joaozin-tech

Build Test my 14600kf Download File the soon @elad335

EmulationChannel avatar Mar 06 '24 20:03 EmulationChannel

its way WIP though, I added some experiemental concepts. I pushed because it had long running conflicts with the transition to SPULLVMRecompiler.cpp and SPUCommonRecompiler.cpp

elad335 avatar Mar 06 '24 20:03 elad335

its way WIP though, I added some experiemental concepts. I pushed because it had long running conflicts with the transition to SPULLVMRecompiler.cpp and SPUCommonRecompiler.cpp

I completely understand when the build can be compiled successfully so I can do some testing THE LAST OF US GOW 3 UNCHARTED 1 2 3 KILZZONE lol

EmulationChannel avatar Mar 06 '24 20:03 EmulationChannel

image

F {SPU Worker 11} SIG: Thread terminated due to fatal error: Out of range
(in file D:\a\1\s\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:4292[:22], in function operator ())

RPCS3.log

Darkhost1999 avatar Apr 09 '24 14:04 Darkhost1999

Superseded by #15429

elad335 avatar May 22 '24 04:05 elad335