FEX icon indicating copy to clipboard operation
FEX copied to clipboard

Support Apple TSO mode enable bit

Open Sonicadvance1 opened this issue 3 years ago • 2 comments

Once the kernel has some sort of interface for enabling this flag (prctl?, arch_prctl?) then wire this up.

This should be as simple as changing the TSO IR ops to fall to the "non-atomic" variants and adding a flag to the code cache config. M1/M1X is already significantly faster than Snapdragon even without this hardware feature enabled. So it would just be an improvement on already fast hardware.

TODO: Is there a way to make non-coherent loadstores happen while this TSO flag is still enabled? Loading from our context, TLS, and stack accesses that we already convert to non-TSO for example. Would need someone with hardware to test. Worst case we just eat the TSO cost always, which isn't terrible.

TODO: Hopefully the kernel interface is per thread, so our helper threads don't pay the TSO cost, since they don't need it.

Sonicadvance1 avatar Mar 12 '22 20:03 Sonicadvance1

The interface will obviously be per-thread, it doesn't make any sense as a process flag. I don't think there's any way to bypass TSO, but AIUI it also has very little actual performance overhead in the M1 architecture. If you are doing a bunch of computation (JIT?) where the overhead of the system call is worth turning TSO off, you could do that (the actual system call body will be very lightweight so you're mostly paying the baseline syscall cost only).

marcan avatar Apr 30 '22 16:04 marcan

per-thread TSO enablement is what makes sense to us as well. Per-process would mean our helper threads pay a TSO cost for no reason.

Sonicadvance1 avatar Apr 30 '22 18:04 Sonicadvance1