cmdstan
cmdstan copied to clipboard
-fwhole-program-vtables in STAN_CPP_OPTIMS breaks linkage on Apple Silicon
Summary
When activating STAN_CPP_OPTIMS
on Apple M1 machines, Stan models fail to link. Leaving STAN_CPP_OPTIMS
deactivated but using custom CXXFLAGS
that include all STAN_CPP_OPTIMS
for clang except -fwhole-program-vtables
solves the problem.
Description
I am running native arm64 R version 4.1.1 (aarch64-apple-darwin20.5.0) as compiled by Homebrew. I use CmdStan via the cmdstanr
R package, but this issue should apply to any instance of CmdStan on Apple M1 machines (when trying to compile natively).
After activating STAN_CPP_OPTIMS
in make/local, Stan programs fail to link with the following error:
Compiling Stan program...
0 0x1004fc224 __assert_rtn + 128 1 0x1005017e8 ld::tool::OutputFile::addressAndTarget(ld::Internal const&, ld::Fixup const*, ld::Atom const**) (.cold.1) + 0 2 0x10043b104 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 252 3 0x10043cdfc ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 4004 4 0x100441540 ld::tool::OutputFile::writeAtoms(ld::Internal&, unsigned char*) + 356 5 0x100438fa4 ld::tool::OutputFile::writeOutputFile(ld::Internal&) + 408 6 0x100431adc ld::tool::OutputFile::write(ld::Internal&) + 216 7 0x1003bf1d8 main + 584
A linker snapshot was created at: /tmp/model-5be91a73c035-2021-08-24-141053.ld-snapshot
ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file ld.hpp, line 1190.
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [/var/folders/dx/0v_99lg92db617tjgqfk2y6c0000gn/T/Rtmp33HE2D/model-5be91a73c035] Error 1
Error: An error occured during compilation! See the message above for more information.
I tested adding each of the clang flags in STAN_CPP_OPTIMS
to CXXFLAGS
one by one. Every flag works except -fwhole-program-vtables
(in CXXFLAGS_FLTO
, line 71 of cmdstan/makefile).
As mentioned above, a functional workaround is to use CXXFLAGS+= -fvectorize -ftree-vectorize -fslp-vectorize -ftree-slp-vectorize -fno-standalone-debug -fstrict-return -funroll-loops -flto=full -fstrict-vtable-pointers -fforce-emit-vtables
in make/local: every flag in STAN_CPP_OPTIMS except -fwhole-program-vtables
.
I tried several other possible workarounds in make/local that did not fix the problem:
- activating
STAN_CPP_OPTIMS
but usingCXXFLAGS+= -fno-whole-program-vtables
- activating
STAN_CPP_OPTIMS
(or the equivalentCXXFLAGS
) and addingLDFLAGS = -fwhole-program-vtables
- activating
STAN_CPP_OPTIMS
(or the equivalentCXXFLAGS
) and settingLDFLAGS
to the whole list ofCXXFLAGS_FLTO
in line 71 of cmdstan/makefile
Perhaps it makes sense to check for arm64 and exclude -fwhole-program-vtables
in this case?
Reproducible Steps
- Copy the make/local template to make/local.
- Uncomment the
STAN_CPP_OPTIMS
line. - Clean and rebuild (
cmdstanr::rebuild_cmdstan()
). - Try to compile any Stan model (
cmdstanr::cmdstanmodel(…)
).
Current Output
See above for the error message. Because the Stan model fails to link, there is no other output: there is no model binary to use for sampling.
Expected Output
Stan models should link without errors.
Additional Information
Rebuilding cmdstan yields the following warnings, regardless of the make/local setup.
ld: warning: cannot export hidden symbol typeinfo for tbb::tbb_exception from task_group_context.o ld: warning: cannot export hidden symbol typeinfo name for tbb::tbb_exception from task_group_context.o ld: warning: cannot export hidden symbol typeinfo name for tbb::empty_task from arena.o ld: warning: cannot export hidden symbol typeinfo for tbb::empty_task from arena.o ld: warning: cannot export hidden symbol typeinfo name for tbb::empty_task from scheduler.o ld: warning: cannot export hidden symbol typeinfo name for tbb::tbb_exception from scheduler.o ld: warning: cannot export hidden symbol typeinfo for tbb::empty_task from scheduler.o ld: warning: cannot export hidden symbol typeinfo for tbb::tbb_exception from scheduler.o
TBB was a headache for M1 machines all over the place (see here, for example). I’m not good enough with the guts of LLVM to understand link-time optimisation well, but could it be that these hidden TBB symbols are breaking the whole-program LTO?
Current Version
v2.27.0
Thanks for the report!
I think the simplest solution will be to exclude this flag for arm64.
Has this issue been resolved?
It does seem to be working for me now on an M1 using just STAN_CPP_OPTIMS=true, although I haven't checked carefully to see what changed in the underlying code.
I don’t believe we have made any intentional change, but it’s possible OS updates have improved the M1’s parity with x86 in ways that helped this