rpcs3
rpcs3 copied to clipboard
[WIP][TESTERS NEEDED] SPU LLVM: PUTLLC 16 Optimization
I've noticed that there are many SPU atomic ops which do not fully use the entire 128-byte memory provided for them, some only use a single 16-byte block for it. Usually about %20 to %35 from all of atomic ops only use one 16-byte block. This brings up a thought: what if instead of executing full 128-byte atomic PUTLLC we can replace it with a 16-byte version of it which allows to optimize it with cmpxchg16b. The way it detects it is by analyzing all stores/loads between PUTLLC and GETLLAR. If all LS stores/loads are from the same type and use the same registers state as other stores/loads the pattern is detected. Do note that if only loads were detected in the loop even if not from the same type and may use different addresses, this counts as if no memory operation was executed at all in the atomic op. For safety all branch targets falling into GETLLAR<->PUTLLC range cancel out this optimization, although I haven't encountered such case yet. Fixes GOWA with RSX reservations off but only for this game, as the pattern is detected well for it.
For all cases for where this optimization is detected it is as if "Accurate RSX reservations" was enabled so test this setting as well for performance comparisons as well as TSX vs no-TSX.
Turnaround of events:
- In order to complete this pull request properly, I implemented advacned cross-blocks SPU code analysis that retains pattern data across "future" and "past" code blocks. This is probably one of the most important SPU pull requests in years to come, laying the ground for many optimizations to come.
In honors: I want to pay my gratitudes to Iftah Marash, I could not have done this without him.
I tested 3 games so far following the 3 run step provided - TSX on, RSX setting doesnt matter TSX off, RSX reserv enabled TSX off, RSX resrv disabled
SPU GETLLAR polling detection was set to 'True' in all the games tested
Games
- One Piece: Pirate Warriors - No change in performance
- Resistance 2 - No change in performance
- Killzone HD - With TSX off, RSX reserv enabled and TSX off, RSX resrv disabled the performance decreases 3-4 fps comparison with the TSX 'ON'. Same issue happens in the master version as well.
UnWIPed.
This PR on Linux causes RPCS3 to crash before creating the game window. Log doesn't seem to show much, but on the terminal I get *** stack smashing detected ***: terminated.
Seems to happens on all games, with or without cache. Using SPU Interpreters (Fast or Precise) avoids the crash, Recompilers crash. RPCS3.log
Needs testing.
Enable SPU Debug in debug tab and re-upload log with the latest build.
Heres the re-upload log: RPCS3.log.gz
@Ordinary205 Sorry to request again, can you re-upload log with the latest build? I made a mistake dumping debug information.
It's all right RPCS3.log.gz
Still doesn't boot properly.
RPCS3.log.gz
Pushed a fix, please retest.
Nothing changed.
Not really sure what's going on with this issue.
RPCS3.log.gz
Heres a log of the official build incase if it's useful. RPCS3.log.gz
Spurs test doesn't complete putllc test I also have NFS MW in log RPCS3.log
Fixed spu test, retest.
Also pushed an experimental optimization.
@elad335 Sorry for the disappointment, but it's still not fixed. RPCS3.log.gz
Fixed Need For Speed bug. retest.
Need for Speed Most Wanted now boots properly.
RPCS3.log.gz
Tested my usual 5 game suite (Persona 5, DeS, GOW 3, GOW A, TLOU) with this PR.
Every game booted and played just fine, performance was within margin of error, except for Demon's Souls which gained 6 FPS compared to latest master.
Master - 104 FPS
This PR - 110 FPS
More patterns are detected now if you want to retest.
Heres a comparison for Midnight Club LA.
Master: 47.1/50/53.2 FPS
PR: 49/50.7/52 FPS
The master seems to vibrate, and the PR seems slightly steady.
With the latest pr commits, every game in my 5 game test bench now exhibits noticeable performance improvements, although God of War Ascension now has some flickering not present in current master.
Demon's Souls improves even further from 110 to 112 FPS average now.
Video demonstration of flickering (seizure warning):-
https://github.com/RPCS3/rpcs3/assets/4345150/05f8aa83-9d59-4ba4-96fa-8d9b835fabcd
Zipped SS of results putllc pr.zip
Testing all games PS3 E SIG: Thread [Vulkan Device Enumeration Thread] is too sleepy. Waiting for it 31100.867µs already!
Testing all games PS3 E SIG: Thread [Vulkan Device Enumeration Thread] is too sleepy. Waiting for it 31100.867µs already!
That message is generic. Do you have any context or the log file to assist in understanding what you're saying? Did games freeze while playing them that didn't on master branch?
That message is generic. Do you have any context or the log file to assist in understanding what you're saying? Did games freeze while playing them that didn't on master branch?
in general games tend to be slow with this error message from the Vulkan API I deleted everything here without logs I deleted everything back to the master branch even where it doesn't happen at any time friends thanks
Build Test my 14600kf Download File the soon @elad335
its way WIP though, I added some experiemental concepts. I pushed because it had long running conflicts with the transition to SPULLVMRecompiler.cpp and SPUCommonRecompiler.cpp
its way WIP though, I added some experiemental concepts. I pushed because it had long running conflicts with the transition to SPULLVMRecompiler.cpp and SPUCommonRecompiler.cpp
I completely understand when the build can be compiled successfully so I can do some testing THE LAST OF US GOW 3 UNCHARTED 1 2 3 KILZZONE lol
F {SPU Worker 11} SIG: Thread terminated due to fatal error: Out of range
(in file D:\a\1\s\rpcs3\Emu\Cell\SPUCommonRecompiler.cpp:4292[:22], in function operator ())
Superseded by #15429