Estimator unit test fails on M1 Mac with gcc
Describe the bug develop branch seems to have a bug, at least on my mac. I haven't been able to reproduce it anywhere else. Looks like a BUS error on unit_test_estimator
(note also, if you want to be able to even compile on the M1 right now with homebrew g++, you have to change your CLT from v14. There was a bug introduced with CLT v14 that seems to have a problem linking. The current solution is to download a previous CLT or use the 14.1 beta CLT. Any of those can be downloaded from apple developer)
To Reproduce
git checkout develop
build_dir=build_gcc
mkdir -p $build_dir
cd $build_dir
CC=gcc-12
CXX=g++-12
cmake -D QMC_MPI=0
-D CMAKE_C_COMPILER=$CC
-D CMAKE_CXX_COMPILER=$CXX
-D QMC_COMPLEX=1
..
make -j 8
ctest -R unit_estimators
Expected behavior test shouldn't fail
System:
- M1 Mac, running Monterey 12.6
Additional context I talked through this with Ye at the All-hands meeting, and the cause is still unclear to us. When I recompile with -g, lldb gives the backtrace below
The issue seems to be in InputSection::setFromValue. It is failing on the "count" name being passed in as RealType(15.)
Not really sure why it is failing but passing elsewhere. @ye-luo told me to ping @PDoakORNL to see if he had any ideas
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x102f8e105)
* frame #0: 0x0000000195f17e0c libc++abi.dylib`__cxxabiv1::__class_type_info::process_static_type_above_dst(__cxxabiv1::__dynamic_cast_info*, void const*, void const*, int) const + 4
frame #1: 0x0000000102e0ba04 libstdc++.6.dylib`get_adjusted_ptr(std::type_info const*, std::type_info const*, void**) + 100
frame #2: 0x0000000102e0c2ac libstdc++.6.dylib`__gxx_personality_v0 + 1260
frame #3: 0x00000001a0a8f380 libunwind.dylib`_Unwind_RaiseException + 576
frame #4: 0x0000000102e0ca84 libstdc++.6.dylib`__cxa_throw + 84
frame #5: 0x0000000100064ee8 test_estimators`std::__throw_bad_any_cast() at any:64:24
frame #6: 0x0000000100072090 test_estimators`int std::any_cast<int>(__any=0x000060000170c468) at any:471:27
frame #7: 0x0000000100161238 test_estimators`void qmcplusplus::InputSection::setFromValue<std::any>(this=0x000000016fdfbca8, name=0x000060000170c448, value=0x000060000170c468) at InputSection.cpp:149:39
frame #8: 0x0000000100160634 test_estimators`qmcplusplus::InputSection::init(this=0x000000016fdfbca8, init_values=0x000000016fdfd6f0) at InputSection.cpp:73:17
frame #9: 0x000000010006df8c test_estimators`::____C_A_T_C_H____T_E_S_T____9() at test_InputSection.cpp:320:5
frame #10: 0x00000001000b2b00 test_estimators`Catch::TestInvokerAsFunction::invoke(this=0x0000600000004200) const at catch.hpp:14321:25
frame #11: 0x00000001000b1d70 test_estimators`Catch::TestCase::invoke(this=0x00000001038126d8) const at catch.hpp:14160:21
frame #12: 0x00000001000abb88 test_estimators`Catch::RunContext::invokeActiveTestCase(this=0x000000016fdfde58) at catch.hpp:13020:33
frame #13: 0x00000001000ab9a4 test_estimators`Catch::RunContext::runCurrentTest(this=0x000000016fdfde58, redirectedCout=0x000000016fdfdb08, redirectedCerr=0x000000016fdfdae8) at catch.hpp:12993:37
frame #14: 0x00000001000aa97c test_estimators`Catch::RunContext::runTest(this=0x000000016fdfde58, testCase=0x00000001038126d8) at catch.hpp:12754:27
frame #15: 0x00000001000ad278 test_estimators`TestGroup::execute(this=0x000000016fdfde48) const at catch.hpp:13347:52
frame #16: 0x00000001000aec78 test_estimators`Catch::Session::runInternal(this=0x000000016fdfe1b8) at catch.hpp:13553:46
frame #17: 0x00000001000ae9b0 test_estimators`Catch::Session::run(this=0x000000016fdfe1b8) at catch.hpp:13509:35
frame #18: 0x00000001000d4aa4 test_estimators`int Catch::Session::run<char>(this=0x000000016fdfe1b8, argc=1, argv=0x000000016fdfe848) at catch.hpp:13231:33
frame #19: 0x00000001000c5e30 test_estimators`main(argc=1, argv=0x000000016fdfe848) at catch_main.cpp:64:27
frame #20: 0x000000010243d08c dyld`start + 520
Do the tests pass with a non-complex build?
Do the tests pass with a non-complex build?
No, it fails regardless of real/complex
Also, I tried to compile with the address sanitizer support, but it seems that homebrew gnu compilers don't come with the libraries for the M1, whereas for intel Macs the libraries are there.
With the release of command line tools 4.1 I was able to independently reproduce this.
This issue remains with gcc-12 on my mac. It is an issue of gcc on mac I believe.
Does macports or brew installed clang have any issues? It would be good to have a recommendable route and to update the build recipe in the manual.
Does macports or brew installed clang have any issues? It would be good to have a recommendable route and to update the build recipe in the manual.
I only tried brew. The issue with clang was, I failed to find a working C++ standard library advanced enough for qmcpack needs.
So to bring me into the loop here, is this still just an M1 phenomenon?
The ARM-based Orange PI reporting at https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=396302 shows only numerical "differences"/failures in a deterministic optimizer test. No x86 builds are failing. => only issues on M1 so far.
Has anyone already tried the spack route instead of macports, brew?
Reproduced with macports gcc 12.2.0
I just hit this problem using gcc13 installed via homebrew on an M1 laptop running OSX Monterrey 12.7. Clang seems to still be problematic for the reasons Ye mentioned earlier.
Tried reinvestigating this just now on an m1 with Sonoma 14.1 . With #4815 I was finally able to build with AppleClang (!) and this test passed. Builds with gcc13 from macports resulting in a failing estimator unit test (only), but I notice they also configured with OpenBLAS while the AppleClang one picked up the preferred Accelerate framework. There might be other potentially significant differences. fftw-3, hdf5, boost were from macports. Note that the mpich and openmpi ports have issues so we aren't yet at a clean and easy build solution on Apple where everything "just works" as expected.
Not an Accelerate/OpenBLAS issue. Builds differing only by appleclang/gcc-13 fail only for the gcc-13 case.
Playing around with this I found that the bus error results from either of the CHECK_THROWS_AS tests in the InputSection::init TEST_CASE in test_InputSection.cpp. With them both commented, all the tests pass with gcc13.2 from macports.
TEST_CASE("InputSection::init", "[estimators]")
{
SECTION("bad type handling")
{
TestInputSection ti;
//CHECK_THROWS_AS(ti.init({{"full", bool(false)}, {"count", int(15)}, {"width", int(10)}}), UniformCommunicateError);
}
...
SECTION("invalid type assignment")
{
TestInputSection ti;
//CHECK_THROWS_AS(ti.init({{"full", bool(false)}, {"count", Real(15.)}}), UniformCommunicateError);
}
As far as I can tell gcc 13 does not officially support apple M1 at all. Looking at the homebrew formula its pulling in a unmerged branch from a well know but not official repo. I don't see any reason why we should support it or even look into any further. Use a compiler where support for M1 has actually been merged.
I would suggest we only officially support mainline llvm on osx.
Noting that this problem still exists on Sequoia 15.3 with macports gcc 13.3.0, 14.2.0. However it does not occur with macports clang 18 or 19 or Apple Clang 16.0.0 (clang-1600.0.26.6).
Unfortunately with all the clangs deterministic-unit_test_utilities_for_testing is also broken in test_NativeInitializerPrint.cpp. The full deterministic test set otherwise passes.