Native backend produces incorrect code with LLVM 8... but only on MacOS?
This is bound to be a fun one: one of my larger kernels is producing incorrect results when compiled with accelerate-llvm-native on MacOS when using LLVM 8. The same kernel agrees with the interpreter on all my Linux machines, and also agrees with the interpreter on my Mac when I use LLVM 7.
I've got two working theories:
- I think the only platform-specific part of accelerate-llvm-native is the linker, so perhaps there's some problem there? Although I can't imagine what LLVM would break from 7 to 8 that would cause a linker to do the wrong thing...
- My Mac is a 2018 MBP with an Intel i9. All the Linux machines I've tested on have i7s. The problem kernel does a huge number of floating point ops, so perhaps some optimization LLVM 8 is doing changes the numerical stability in a way that doesn't matter on an i7, but does matter on an i9? That certainly wouldn't be the weirdest numerical stability issue I've run in to...
Once I've got a few spare hours I'll begin pruning down a minimal repro.
I was able to test the same kernel on a Mac with an Intel m3, so the wacky i9 hypothesis is out. Seems it is indeed MacOS-related.
hm, that will indeed be fun to debug.
I'd be useful to compare the object code (.o) file generated by LLVM-8 and LLVM-7, and see if 8 is producing some new/unusual instructions or relocations which the linker isn't dealing with properly.
I'd like to replace these custom linkers with the functions from llvm/llvm-hs, although last time I checked (quite a while ago now) the llvm-hs bindings weren't working for what I needed; seems time to reinvestigate.