accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Native backend produces incorrect code with LLVM 8... but only on MacOS?

Open TravisWhitaker opened this issue 6 years ago • 2 comments

This is bound to be a fun one: one of my larger kernels is producing incorrect results when compiled with accelerate-llvm-native on MacOS when using LLVM 8. The same kernel agrees with the interpreter on all my Linux machines, and also agrees with the interpreter on my Mac when I use LLVM 7.

I've got two working theories:

  • I think the only platform-specific part of accelerate-llvm-native is the linker, so perhaps there's some problem there? Although I can't imagine what LLVM would break from 7 to 8 that would cause a linker to do the wrong thing...
  • My Mac is a 2018 MBP with an Intel i9. All the Linux machines I've tested on have i7s. The problem kernel does a huge number of floating point ops, so perhaps some optimization LLVM 8 is doing changes the numerical stability in a way that doesn't matter on an i7, but does matter on an i9? That certainly wouldn't be the weirdest numerical stability issue I've run in to...

Once I've got a few spare hours I'll begin pruning down a minimal repro.

TravisWhitaker avatar Jun 02 '19 18:06 TravisWhitaker

I was able to test the same kernel on a Mac with an Intel m3, so the wacky i9 hypothesis is out. Seems it is indeed MacOS-related.

TravisWhitaker avatar Jun 03 '19 16:06 TravisWhitaker

hm, that will indeed be fun to debug.

I'd be useful to compare the object code (.o) file generated by LLVM-8 and LLVM-7, and see if 8 is producing some new/unusual instructions or relocations which the linker isn't dealing with properly.

I'd like to replace these custom linkers with the functions from llvm/llvm-hs, although last time I checked (quite a while ago now) the llvm-hs bindings weren't working for what I needed; seems time to reinvestigate.

tmcdonell avatar Jun 06 '19 08:06 tmcdonell