cmdstan icon indicating copy to clipboard operation
cmdstan copied to clipboard

-fwhole-program-vtables in STAN_CPP_OPTIMS breaks linkage on Apple Silicon

Open jaburgoyne opened this issue 3 years ago • 4 comments

Summary

When activating STAN_CPP_OPTIMS on Apple M1 machines, Stan models fail to link. Leaving STAN_CPP_OPTIMS deactivated but using custom CXXFLAGS that include all STAN_CPP_OPTIMS for clang except -fwhole-program-vtables solves the problem.

Description

I am running native arm64 R version 4.1.1 (aarch64-apple-darwin20.5.0) as compiled by Homebrew. I use CmdStan via the cmdstanr R package, but this issue should apply to any instance of CmdStan on Apple M1 machines (when trying to compile natively).

After activating STAN_CPP_OPTIMS in make/local, Stan programs fail to link with the following error:

Compiling Stan program...

0 0x1004fc224 __assert_rtn + 128 1 0x1005017e8 ld::tool::OutputFile::addressAndTarget(ld::Internal const&, ld::Fixup const*, ld::Atom const**) (.cold.1) + 0 2 0x10043b104 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 252 3 0x10043cdfc ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 4004 4 0x100441540 ld::tool::OutputFile::writeAtoms(ld::Internal&, unsigned char*) + 356 5 0x100438fa4 ld::tool::OutputFile::writeOutputFile(ld::Internal&) + 408 6 0x100431adc ld::tool::OutputFile::write(ld::Internal&) + 216 7 0x1003bf1d8 main + 584

A linker snapshot was created at: /tmp/model-5be91a73c035-2021-08-24-141053.ld-snapshot

ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file ld.hpp, line 1190.

clang: error: linker command failed with exit code 1 (use -v to see invocation)

make: *** [/var/folders/dx/0v_99lg92db617tjgqfk2y6c0000gn/T/Rtmp33HE2D/model-5be91a73c035] Error 1

Error: An error occured during compilation! See the message above for more information.

I tested adding each of the clang flags in STAN_CPP_OPTIMS to CXXFLAGS one by one. Every flag works except -fwhole-program-vtables (in CXXFLAGS_FLTO, line 71 of cmdstan/makefile).

As mentioned above, a functional workaround is to use CXXFLAGS+= -fvectorize -ftree-vectorize -fslp-vectorize -ftree-slp-vectorize -fno-standalone-debug -fstrict-return -funroll-loops -flto=full -fstrict-vtable-pointers -fforce-emit-vtables in make/local: every flag in STAN_CPP_OPTIMS except -fwhole-program-vtables.

I tried several other possible workarounds in make/local that did not fix the problem:

  • activating STAN_CPP_OPTIMS but using CXXFLAGS+= -fno-whole-program-vtables
  • activating STAN_CPP_OPTIMS (or the equivalent CXXFLAGS) and adding LDFLAGS = -fwhole-program-vtables
  • activating STAN_CPP_OPTIMS (or the equivalent CXXFLAGS) and setting LDFLAGS to the whole list of CXXFLAGS_FLTO in line 71 of cmdstan/makefile

Perhaps it makes sense to check for arm64 and exclude -fwhole-program-vtables in this case?

Reproducible Steps

  1. Copy the make/local template to make/local.
  2. Uncomment the STAN_CPP_OPTIMS line.
  3. Clean and rebuild (cmdstanr::rebuild_cmdstan()).
  4. Try to compile any Stan model (cmdstanr::cmdstanmodel(…)).

Current Output

See above for the error message. Because the Stan model fails to link, there is no other output: there is no model binary to use for sampling.

Expected Output

Stan models should link without errors.

Additional Information

Rebuilding cmdstan yields the following warnings, regardless of the make/local setup.

ld: warning: cannot export hidden symbol typeinfo for tbb::tbb_exception from task_group_context.o ld: warning: cannot export hidden symbol typeinfo name for tbb::tbb_exception from task_group_context.o ld: warning: cannot export hidden symbol typeinfo name for tbb::empty_task from arena.o ld: warning: cannot export hidden symbol typeinfo for tbb::empty_task from arena.o ld: warning: cannot export hidden symbol typeinfo name for tbb::empty_task from scheduler.o ld: warning: cannot export hidden symbol typeinfo name for tbb::tbb_exception from scheduler.o ld: warning: cannot export hidden symbol typeinfo for tbb::empty_task from scheduler.o ld: warning: cannot export hidden symbol typeinfo for tbb::tbb_exception from scheduler.o

TBB was a headache for M1 machines all over the place (see here, for example). I’m not good enough with the guts of LLVM to understand link-time optimisation well, but could it be that these hidden TBB symbols are breaking the whole-program LTO?

Current Version

v2.27.0

jaburgoyne avatar Sep 24 '21 12:09 jaburgoyne

Thanks for the report!

I think the simplest solution will be to exclude this flag for arm64.

rok-cesnovar avatar Sep 25 '21 14:09 rok-cesnovar

Has this issue been resolved?

saudiwin avatar Jan 31 '23 09:01 saudiwin

It does seem to be working for me now on an M1 using just STAN_CPP_OPTIMS=true, although I haven't checked carefully to see what changed in the underlying code.

jaburgoyne avatar Feb 01 '23 10:02 jaburgoyne

I don’t believe we have made any intentional change, but it’s possible OS updates have improved the M1’s parity with x86 in ways that helped this

WardBrian avatar Feb 01 '23 13:02 WardBrian