Just in time OBJ/IR Caching
Overview
Caches IR and OBJ as it gets compiled, in a multithreaded and multi process safe way.
This started as a cleanup of the loading interface, and as such has a lot of cruft, then I cherry-picked stuff in from #1548
Reading from the caches takes a shared_lock, and that's it.
Writing to the caches takes a unique_lock, and also fnctl advisory lock when the index or the data files are resized.
Index files are re-mapped as they grow, they grow in 64kb chunks.
Data files are mapped in 'chunks' of up to 16Megs.
Settings have been cleaned up. AOTIRLoad and AOTIRCapture have been merged to IRCache. AOTIRGenerate has been completely removed, and a separate executable, FEXAOTGen is used for that.
To enable caching, use
"IRCache": "auto"/"disabled"/"read"/"write"/"readwrite",
"ObjCache": "auto"/"disabled"/"read"/"write"/"readwrite"
Todo
- [x] Rework in a generic index file
- [x] Cleanup & Implement configuration
- [x] Cleanup object cache / debug changes
- [x] See what broke tests
- [x] Update Summary
- [ ] Investigate glmark2 rare crash
- [ ] gimp also crashes
Follow Up
- [ ] Investigating adding some form of balancing in the BSTs?
- [ ] Investigate adding a sorted array mode if the BSTs are not dirty
Among other things, this also hashes the actual block ranges, not (start, end). It also limits the blocks to a single vma mapping. So we don't compile code from multiple files into one cached block.
Implements #798 with a brute force approach
Also, it inserts ranges for cached blocks for SMC invalidation, though only after hashing them (Follow up: #1963)
(Refactor broke this branch, will fix toms)
Steam is stable again :)
(now with OBJ support for x86 and arm64, always enabled for now)
This has been rebased on top of main, and most bugs have been fixed. I plan to spent tomorrow on cleanups as well.
Would be good to get some feedback as-is @Sonicadvance1 @neobrain ~
Would be nice if this passed CI before reviewing thoroughly.
Would be nice if this passed CI before reviewing thoroughly.
The CI fails are unrelated to the main logic, it's just not writed initialization for the TestHarness. I'll be pushing a fix in a bit
This has been cleaned up, with most dead/stale code removed.
Follow up
- [x] Fully postfix cache folders with compilation modal options
- [x] Make AOTGenerate a fexloader-only option
- [x] Fix FEXUpdateIRCache.sh, add FEXUpdateObjCache.sh, cleanup scripts?
- [x] Compact relocations to actual size
Follow Up:
- [x] Generate relocations in relocation pools, import cached files directly to code cache (#1939, #1332)
All tasks completed on this? Merge conflict is still here.
(This has been squashed and rebased to main)
After much, much debugging and fixing several other bugs, it looks like this is a race condition that goes away when code isn't compiled 'too fast'.
So far I've verified that it is not
- Bad code stored in cache
- Bad code read from cache
- Cache index/data corruption
@Sonicadvance1 thoughts on disabling/ignoring the Visual Debugger for now? It is largely broken at this point, all of the APIs i've disabled here were already broken before.
Otherwise, I'll do another bug hunting spree for this tonight, and if the bug is not found I think it's best to merge and keep experimental til 2210.
Bug fixed, ranges were not serialized correctly.
Follow Up
- Performance drops a little with that, though we can work around it in different ways in the future, such as whole page hasing (#1961)
Now investigating the interpreter failures.
With the last round of fixes this passes all asm tests for me locally.
I'll take another look tomorrow to cleanup things and get everything ready for merge.
IR tests will have to be updated for new OP_BREAK semantics
Looks like all the fixes have managed to round up smc issues outlined in #1754 at last.
Based on my testing, parallels/m1 is ~ 60% likely to fail on that test, with stale code running for ever.
Oddly enough, it doesn't repro in orion
The smc issues don't repro with gdb attached, so likely yet another race somewhere between multi threaded invalidation and translation.
Also, the smc test crash with objc enabled, possibly with irc as well. While not a blocker for now, possibly follow up?
Hmm, looks like that resolved the SMC issue, though we got another spurious failure in pthread_cancel in the 8.4 runner. I've seen this one before, though I haven't investigated it. Logged as follow up for https://github.com/FEX-Emu/FEX/issues/1754#issuecomment-1232677973 and will re-run CI here.
Also, the smc test crash with objc enabled, possibly with irc as well. While not a blocker for now, possibly follow up?
Generated #1958
Some performance numbers from Orion
No Cache
skmp@ornio:~/projects/FEX/build$ time Bin/FEXLoader /bin/ls > /dev/null
real 0m0,277s
user 0m0,249s
sys 0m0,025s
OBJCache
skmp@ornio:~/projects/FEX/build$ time Bin/FEXLoader /bin/ls > /dev/null
[Info] Warning: OBJ/IR Caches are experimental, and might lead to crashes.
real 0m0,029s
user 0m0,012s
sys 0m0,015s
Native
skmp@ornio:~/projects/FEX/build$ time /bin/ls > /dev/null
real 0m0,007s
user 0m0,000s
sys 0m0,007s
Best OBJCache result so far
skmp@ornio:~/projects/FEX/build$ time Bin/FEXLoader /bin/true > /dev/null
[Info] Warning: OBJ/IR Caches are experimental, and might lead to crashes.
real 0m0,014s
user 0m0,009s
sys 0m0,005s
(Closing this as there is an ongoing powergrab by @Sonicadvance1, I will migrate my work to https://github.com/skmp/fex-emu-ng.git)