[🐧] BizInvoker does not cause gc mode switch without compatibility argument
Calling a [BizImport]ed function that doesn't use Compatibility = true will not cause a gc mode change.
This is relevant because mono relies on each thread having the correct gc mode to perform garbage collection effectively. If a thread is assumed to be in gc unsafe mode (executing managed code), the garbage collector will wait for the thread to reach a safe point and suspend itself so garbage collection can be performed, but if an unmanaged function is being executed this safe point will only be reached once the function returns back to managed code, which may never happen. This initially came up in #4117 where garbage collection would deadlock the entire program because garbage collection was waiting on the core thread to suspend itself while the core thread was waiting on a command from the frontend (while the frontend thread was already suspended).
Aside from the possibility to cause deadlocks this might also negatively affect performance across the board because gc's are being stalled while such functions are being executed.
Is this a mono bug? The bizinvoke call in is just calli with an unmanaged fn pointer, and we tell the runtime that it's an unmanaged ptr:
https://github.com/TASEmulators/BizHawk/blob/7a89e1eabf1f7819890dec5eff0d6047f7d48618/src/BizHawk.BizInvoke/BizInvoker.cs#L408
If there's supposed to be any GC barrier there, the JIT should be inserting it.
I don't know if it's a bug, just a consequence of how mono works. Mono only inserts gc transitions in specific predefined places, like P/Invoke calls: https://github.com/wine-mono/mono/blob/3cc168b49a466de355a30dd1ee25f2e907c0fbaa/mono/metadata/marshal-ilgen.c#L1997
Maybe we could manually replicate that logic by inserting mono_threads_enter_gc_safe_region and mono_threads_exit_gc_safe_region calls around the calli instruction when running under mono.
How does mono handle C# function pointers here? (e.g. delegate* unmanaged[Cdecl], those use calli underneath them)
How does mono handle C# function pointers here? (e.g.
delegate* unmanaged[Cdecl], those use calli underneath them)
Assuming I implemented it correctly these suffer from the exact same problem; no gc state transition happens.
Sounds like it would just be a mono bug here, annoyingly (not doing a gc state transition should only happen here if SuppressGCTransition is applied to the unmanaged C# function pointer, without it there should be a GC transition as one would expect with an unmanaged call).
EDIT: Looks like this was properly implemented within .NET Runtime's mono fork: https://github.com/dotnet/runtime/issues/38480 / https://github.com/dotnet/runtime/pull/46491
EDIT EDIT: ... and implemented in wine-mono too: https://github.com/wine-mono/mono/commit/5bf7d8ba96d0ca3c81627435acb9cc53121616b3 (what version of mono exactly was being used to test here?)
It's broken in mono 6.8 and 6.12 (and probably all versions inbetween). It's good to hear that it's finally been fixed with mono 6.14, but it will probably take some time until that version is readily available everywhere.
You need more precision in the version number. For reference our (undocumented) floor was 6.12.0.151, but I dropped this check the other day because it's been so long. Choosing 6.14.0 as the new floor would be fine. It's in Nixpkgs >= 25.05.
Off-topic, but apparently the ~~new ownership~~ sole new maintainer (!) reversed course and updated libgdiplus with Mono 6.14.0, so that's worth investigating.
You need more precision in the version number.
Not in this case because there's not a single 6.12.* that has the fix. 6.14 apparently released with many previously committed things that were never put into a release.
Still, what do we do about non-nix? Tell every user "update to mono 6.14 or bizhawk might randomly crash"? I for one don't know how I would even install mono 6.14 right now without compiling from source or upgrading the full distribution.
Lacking a GC switch probably won't actually cause some random crash, since all that means the thread is stuck waiting for a safe point until the unmanaged function returns. The worst is a deadlock like the one described, although that would only happen if you have some thread call an unmanaged function via calli and never return, which is a very rare case in BizHawk to occur at all (especially with most of the unmanaged cores being wbx cores where such a scenario can't happen, and such a scenario can be worked around by Compatibility = true so it isn't unavoidable). Outside of that deadlock scenario the worst that happens is theoretically worse performance (granted the performance loss would be very minor practically, especially since BizHawk is largely single-threaded and the main thread is the one thread that practically ever uses the BizInvoker, and might be balanced out by calli anyways)
The most that practically really needs to be done by us is probably just a doc update so devs don't trigger a deadlock scenario (i.e. devs do Compatibility = true for a function intended to never return). No reason to bump up the minimum Mono requirement as long as we apply the workaround when needed (which should be very very rarely)
Still, what do we do about non-nix? Tell every user "update to mono 6.14 or bizhawk might randomly crash"? I for one don't know how I would even install mono 6.14 right now without compiling from source or upgrading the full distribution.
Yes, everyone will need to update, which they should be doing regularly anyway. For distros that haven't shipped the new version yet, you can install Nix on non-NixOS, or even use Nix-built packages without Nix. Or just avoid the features that crash (is it just Mupen atm?).
Or just avoid the features that crash (is it just Mupen atm?).
As far as I'm aware it's just the mupen core yes, but how do I "avoid the features that crash" when it affects the entire core?
granted the performance loss would be very minor practically
In the commit setting MONO_THREADS_SUSPEND=preemptive yoshi noted that that resulted in improved performance, so I don't think it's just a minor gain.
I realize that if mono is the problem and a newer version has a fix then the solution is to update mono, but that isn't always practical or possible, and there should be something that guarantees functionality even if an older mono version is used (which happens to be the reality in most major distros as the moment). Something like d1fd6c30b98665b4ca9f53d0e763c444fed73f08 which is redundant for newer versions also.
The only loss of functionality with Compatibility=true should be in waterbox scenarios:
https://github.com/TASEmulators/BizHawk/blob/7a89e1eabf1f7819890dec5eff0d6047f7d48618/src/BizHawk.BizInvoke/BizInvoker.cs#L240
In all other cases, it is just a more compatible, and presumably slightly slower, way to go. So we could just spray compatibility mode all over mupen.
This is an unfortunate situation all around, but the bug is clearly on mono's side.