rpcs3 icon indicating copy to clipboard operation
rpcs3 copied to clipboard

[Regression] God of War III consistently causes kernel panics on macOS (#17327)

Open schm1dtmac opened this issue 4 months ago • 19 comments

Quick summary

When trying to play GOW3, it will consistently kernel-panic in the intro Gaia section once in-game on macOS using any RPCS3 build after #17327.

Details

Will attach only the working log for now, getting the regressed log is especially hard because when a panic occurs, macOS automatically reboots, and when you log back in, RPCS3 is forcibly relaunched (this is system behaviour upon panics that CANNOT be disabled), instantly purging the previous panicked log. To compensate I've attached a panic report instead but I'll try and find a way to get the log somehow. EDIT: Regressed log is now attached.

Patches don't affect the issue, although because of GOW3's long intro videos when starting a new game I was a bit naughty and used the video skip patches to make life easier. However, I have confirmed the issue with ALL patches disabled on working and non-working builds at least once.

To reproduce: 1- Use a build from after #17327 was merged, and an up-to-date 1.03 GOWII copy. 2- Stick with default settings but with wiki modifications for GOWIII, but really this shouldn't make too big a difference. 3- Use either no patches or just the video skip patches to get past the long intro cutscene. 4- Load the game, start a new game on any difficulty, get past the first FMV to the scene where you first see Kratos. 5- On broken builds this is where you will panic, working builds proceed as normal.

Build with regression

v0.0.37-18055-aa50b0fb

Attach two log files

Working RPCS3 log: RPCS3.log.gz

Regressed RPCS3 log: paniclog.log

Regressed panic report from macOS: panic-full-2025-08-23-214300.0002.txt

Attach capture files for visual issues

No response

System configuration

OS: macOS 15.6.1.

Hardware: MacBook Air M2 (2022). Memory: 8GB Unified. CPU: M2, 4P 4E. GPU: M2, 8 cores.

Other details

No response

schm1dtmac avatar Aug 23 '25 20:08 schm1dtmac

Confirmed, issue happens on latest macOS 15.6.0 M4 Mac Mini (4P+6E) w/16GB RAM Was able to generate a log for the x86-64 version of RPCS3 but none for ARM for the reason mentioned already in this post

RPCS3(1).log gow 3 crash log arm64.log

YuriNator557 avatar Aug 23 '25 22:08 YuriNator557

Through some blood, sweat and tears I managed to extract that damn regressed log (had to boot into macOS Recovery instantly after the panic before the Mac boots back up, then copy out the log using Terminal), I've attached it to the main post now. Tagging @kd-11 since this pertains to the recent VK changes.

schm1dtmac avatar Aug 24 '25 11:08 schm1dtmac

@schm1dtmac An actual panic? Or just appcrash? If it actually panics that's really not our bug.

What concerns me is basically that on MacOS we're going through MVK, Metal and then apple's weird firmware driver systems. The bug lies somewhere down there, even if we triggered it by calling something unimplemented/incomplete/etc. We don't have a native metal backend yet, and I for one am not looking forward to debugging macOS kernel issues which kinda just ties my hands on this one for now.

kd-11 avatar Aug 24 '25 18:08 kd-11

Btw - the x86-64 build works fine, right? That would eliminate the error from the vulkan API code since it would be emitting the same commands. But maybe something is wrong with moltenvk for arm64?

kd-11 avatar Aug 24 '25 18:08 kd-11

No, the x86-64 build still crashes the entire computer.

YuriNator557 avatar Aug 24 '25 18:08 YuriNator557

Too much memory used for descriptors maybe? Not sure why it wouldn't just throw the correct error code. My guess is nobody knows which layer should handle it. MacOS throws it up to user, Metal throws it up to MVK which doesn't properly handle things. Since the whole PC crashes, it is impossible to even know which call failed and debugging tools are useless. We can first check if lowering pool size helps: https://github.com/RPCS3/rpcs3/blob/master/rpcs3/Emu/RSX/VK/vkutils/descriptors.h#L42 Set to like 256 or something, see if things improve. Failing that, this needs to go up the chain (khronos/mvk -> apple) and hope they fix their parts until we get to a state where the app crashes normally and can be debugged or MVK throws one of the VK_ERROR_XXXXX enums that we're supposed to get on vulkan API call failure. I think we may have accidentally stumbled on some serious OS-level issue.

kd-11 avatar Aug 24 '25 18:08 kd-11

@schm1dtmac An actual panic? Or just appcrash? If it actually panics that's really not our bug.

What concerns me is basically that on MacOS we're going through MVK, Metal and then apple's weird firmware driver systems. The bug lies somewhere down there, even if we triggered it by calling something unimplemented/incomplete/etc. We don't have a native metal backend yet, and I for one am not looking forward to debugging macOS kernel issues which kinda just ties my hands on this one for now.

Real panic unfortunately. I was suspecting myself that MVK should be handling this more gracefully if anything. I'll look into the pool descriptor size changes and run some test builds with that adjusted though as you suggested.

schm1dtmac avatar Aug 24 '25 18:08 schm1dtmac

Real panic unfortunately.

Hmm, in this case, we may need to report it to apple directly, though I don't expect they will do anything about an emulator.

kd-11 avatar Aug 24 '25 18:08 kd-11

Yeah, descriptor size also isn't the issue it seems, tried a build with 256 and that panicked as well.

schm1dtmac avatar Aug 24 '25 19:08 schm1dtmac

For reference, on macOS Tahoe (developer beta 7) the system doesn't panic but otherwise still grinds to a halt and needs a manual reboot, seems a bit more resistant to complete failure but not by much (at least I can still move my cursor this time when everything locks up).

schm1dtmac avatar Aug 24 '25 22:08 schm1dtmac

i face the same issue on latest version 0.0.37-18115 on macos Air m3 . i tried many different configs and use lowest as possible to run the game. it crashes or even makes os reboot. i just fixed the issue by downgrading to 0.0.37-18025 version

sajjad-fatehi avatar Aug 31 '25 18:08 sajjad-fatehi

You can try with https://github.com/RPCS3/rpcs3/pull/17427 though I expect it to probably crash even harder. We'll have to open communication with apple at some point about it, but we have no contacts. Emulators are tricky because corporations will usually just ignore any bug reports we open due to the (incorrectly) perceived legal grey zone they operate in.

kd-11 avatar Aug 31 '25 19:08 kd-11

You can try with #17427 though I expect it to probably crash even harder. We'll have to open communication with apple at some point about it, but we have no contacts. Emulators are tricky because corporations will usually just ignore any bug reports we open due to the (incorrectly) perceived legal grey zone they operate in.

Last time I tried that PR it didn't actually panic, but the game effectively just froze up with graphical flickering/jitters (notably, RPCS3 didn't terminate either). But that was some commits back, so I'll try again now and report back if it's any different and provide logs.

schm1dtmac avatar Aug 31 '25 20:08 schm1dtmac

Never-mind @kd-11, #17427 still causes panics in the same spot in GOW3 with the same settings. As usual, too difficult to get a log realistically from RPCS3, so I'm only able to provide a panic log: panic-full-2025-08-31-214128.0002.log I also tried a few custom builds as a last ditch attempt to see if anything would change: first I bumped the MVK version used in compiling to 1.4, then I moved the build to a Sequoia runner (hence forcing all deps to Sequoia versions), neither of which made any difference at all.

(As a sidenote, do you think it'd be at all possible to implement some kind of 'safe mode'/'compatibility mode' for broken drivers like MacOS MoltenVK, so then at least systems aren't panicking and breaking, even if a degree of severe overhead is incurred? Just conjecture, I'm probably way out of my depth here though.)

schm1dtmac avatar Aug 31 '25 20:08 schm1dtmac

@schm1dtmac is that still an issue?

digant73 avatar Nov 01 '25 19:11 digant73

@schm1dtmac is that still an issue?

Okay this is getting annoying seriously, but yes it is still an issue, for an issue of this magnitude I would've definitely reported here if it wasn't. (Granted recent macOS builds have had other issues making it hard enough to get in-game as-is, but when I have managed to get far enough it still panics).

schm1dtmac avatar Nov 01 '25 21:11 schm1dtmac

@schm1dtmac Disable the "fast" path here: https://github.com/RPCS3/rpcs3/blob/master/rpcs3/Emu/RSX/VK/VKProgramPipeline.cpp#L606 So that it always calls "create" instead of attempting a fast update operation. In the meantime we'll have to figure out how to get this reported to apple. A kernel panic must never arise from just using a graphics API.

kd-11 avatar Nov 10 '25 23:11 kd-11

Maybe time to try komickrisp? https://www.lunarg.com/lunarg-at-xdc-2025-kosmickrisp-overview/

kd-11 avatar Nov 10 '25 23:11 kd-11

Maybe time to try komickrisp? https://www.lunarg.com/lunarg-at-xdc-2025-kosmickrisp-overview/

I had seriously been considering KosmicKrisp last I thought about this tbh, didn't suggest it thus yet as it hasn't been formally released to the public (although it's been merged into Mesa 26.0's codebase apparently https://www.phoronix.com/news/KosmicKrisp-Merged-Mesa-26.0).

@schm1dtmac Disable the "fast" path here: https://github.com/RPCS3/rpcs3/blob/master/rpcs3/Emu/RSX/VK/VKProgramPipeline.cpp#L606 So that it always calls "create" instead of attempting a fast update operation. In the meantime we'll have to figure out how to get this reported to apple. A kernel panic must never arise from just using a graphics API.

I'll have a gander at that real quick 👍.

schm1dtmac avatar Nov 11 '25 00:11 schm1dtmac

@kd-11 Yeah that didn't end up being quick haha, thanks to some actions issues that stopped arm64 macOS builds, finally got around to running a test with that change and have had more trouble even getting far enough into gameplay to trigger a panic (since 90% of the time it'll freeze up earlier on, a desync-type freeze I think as audio keeps playing in many cases), but I reproduced another kernel panic just now even with that suggested change.

schm1dtmac avatar Nov 15 '25 23:11 schm1dtmac

@schm1dtmac Have you submitted the report via the feedback/bugreporting tool? There's no chance of a private channel communication to get this looked at, we just have to report it normally.

kd-11 avatar Nov 19 '25 17:11 kd-11