qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

Estimator unit test fails on M1 Mac with gcc

Open camelto2 opened this issue 3 years ago • 16 comments

Describe the bug develop branch seems to have a bug, at least on my mac. I haven't been able to reproduce it anywhere else. Looks like a BUS error on unit_test_estimator

(note also, if you want to be able to even compile on the M1 right now with homebrew g++, you have to change your CLT from v14. There was a bug introduced with CLT v14 that seems to have a problem linking. The current solution is to download a previous CLT or use the 14.1 beta CLT. Any of those can be downloaded from apple developer)

To Reproduce git checkout develop build_dir=build_gcc mkdir -p $build_dir cd $build_dir CC=gcc-12 CXX=g++-12 cmake -D QMC_MPI=0
-D CMAKE_C_COMPILER=$CC
-D CMAKE_CXX_COMPILER=$CXX
-D QMC_COMPLEX=1
.. make -j 8 ctest -R unit_estimators

Expected behavior test shouldn't fail

System:

  • M1 Mac, running Monterey 12.6

Additional context I talked through this with Ye at the All-hands meeting, and the cause is still unclear to us. When I recompile with -g, lldb gives the backtrace below

The issue seems to be in InputSection::setFromValue. It is failing on the "count" name being passed in as RealType(15.)

Not really sure why it is failing but passing elsewhere. @ye-luo told me to ping @PDoakORNL to see if he had any ideas

 * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x102f8e105)

  * frame #0: 0x0000000195f17e0c libc++abi.dylib`__cxxabiv1::__class_type_info::process_static_type_above_dst(__cxxabiv1::__dynamic_cast_info*, void const*, void const*, int) const + 4
    frame #1: 0x0000000102e0ba04 libstdc++.6.dylib`get_adjusted_ptr(std::type_info const*, std::type_info const*, void**) + 100
    frame #2: 0x0000000102e0c2ac libstdc++.6.dylib`__gxx_personality_v0 + 1260
    frame #3: 0x00000001a0a8f380 libunwind.dylib`_Unwind_RaiseException + 576
    frame #4: 0x0000000102e0ca84 libstdc++.6.dylib`__cxa_throw + 84
    frame #5: 0x0000000100064ee8 test_estimators`std::__throw_bad_any_cast() at any:64:24
    frame #6: 0x0000000100072090 test_estimators`int std::any_cast<int>(__any=0x000060000170c468) at any:471:27
    frame #7: 0x0000000100161238 test_estimators`void qmcplusplus::InputSection::setFromValue<std::any>(this=0x000000016fdfbca8, name=0x000060000170c448, value=0x000060000170c468) at InputSection.cpp:149:39
    frame #8: 0x0000000100160634 test_estimators`qmcplusplus::InputSection::init(this=0x000000016fdfbca8, init_values=0x000000016fdfd6f0) at InputSection.cpp:73:17
    frame #9: 0x000000010006df8c test_estimators`::____C_A_T_C_H____T_E_S_T____9() at test_InputSection.cpp:320:5
    frame #10: 0x00000001000b2b00 test_estimators`Catch::TestInvokerAsFunction::invoke(this=0x0000600000004200) const at catch.hpp:14321:25
    frame #11: 0x00000001000b1d70 test_estimators`Catch::TestCase::invoke(this=0x00000001038126d8) const at catch.hpp:14160:21
    frame #12: 0x00000001000abb88 test_estimators`Catch::RunContext::invokeActiveTestCase(this=0x000000016fdfde58) at catch.hpp:13020:33
    frame #13: 0x00000001000ab9a4 test_estimators`Catch::RunContext::runCurrentTest(this=0x000000016fdfde58, redirectedCout=0x000000016fdfdb08, redirectedCerr=0x000000016fdfdae8) at catch.hpp:12993:37
    frame #14: 0x00000001000aa97c test_estimators`Catch::RunContext::runTest(this=0x000000016fdfde58, testCase=0x00000001038126d8) at catch.hpp:12754:27
    frame #15: 0x00000001000ad278 test_estimators`TestGroup::execute(this=0x000000016fdfde48) const at catch.hpp:13347:52
    frame #16: 0x00000001000aec78 test_estimators`Catch::Session::runInternal(this=0x000000016fdfe1b8) at catch.hpp:13553:46
    frame #17: 0x00000001000ae9b0 test_estimators`Catch::Session::run(this=0x000000016fdfe1b8) at catch.hpp:13509:35
    frame #18: 0x00000001000d4aa4 test_estimators`int Catch::Session::run<char>(this=0x000000016fdfe1b8, argc=1, argv=0x000000016fdfe848) at catch.hpp:13231:33
    frame #19: 0x00000001000c5e30 test_estimators`main(argc=1, argv=0x000000016fdfe848) at catch_main.cpp:64:27
    frame #20: 0x000000010243d08c dyld`start + 520

camelto2 avatar Oct 19 '22 20:10 camelto2

Do the tests pass with a non-complex build?

prckent avatar Oct 23 '22 22:10 prckent

Do the tests pass with a non-complex build?

No, it fails regardless of real/complex

camelto2 avatar Oct 24 '22 19:10 camelto2

Also, I tried to compile with the address sanitizer support, but it seems that homebrew gnu compilers don't come with the libraries for the M1, whereas for intel Macs the libraries are there.

camelto2 avatar Oct 25 '22 16:10 camelto2

With the release of command line tools 4.1 I was able to independently reproduce this.

prckent avatar Nov 01 '22 20:11 prckent

This issue remains with gcc-12 on my mac. It is an issue of gcc on mac I believe.

ye-luo avatar Feb 04 '23 16:02 ye-luo

Does macports or brew installed clang have any issues? It would be good to have a recommendable route and to update the build recipe in the manual.

prckent avatar Feb 04 '23 17:02 prckent

Does macports or brew installed clang have any issues? It would be good to have a recommendable route and to update the build recipe in the manual.

I only tried brew. The issue with clang was, I failed to find a working C++ standard library advanced enough for qmcpack needs.

ye-luo avatar Feb 04 '23 18:02 ye-luo

So to bring me into the loop here, is this still just an M1 phenomenon?

PDoakORNL avatar Feb 06 '23 15:02 PDoakORNL

The ARM-based Orange PI reporting at https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=396302 shows only numerical "differences"/failures in a deterministic optimizer test. No x86 builds are failing. => only issues on M1 so far.

Has anyone already tried the spack route instead of macports, brew?

prckent avatar Feb 06 '23 15:02 prckent

Reproduced with macports gcc 12.2.0

prckent avatar Feb 06 '23 21:02 prckent

I just hit this problem using gcc13 installed via homebrew on an M1 laptop running OSX Monterrey 12.7. Clang seems to still be problematic for the reasons Ye mentioned earlier.

jptowns avatar Oct 27 '23 22:10 jptowns

Tried reinvestigating this just now on an m1 with Sonoma 14.1 . With #4815 I was finally able to build with AppleClang (!) and this test passed. Builds with gcc13 from macports resulting in a failing estimator unit test (only), but I notice they also configured with OpenBLAS while the AppleClang one picked up the preferred Accelerate framework. There might be other potentially significant differences. fftw-3, hdf5, boost were from macports. Note that the mpich and openmpi ports have issues so we aren't yet at a clean and easy build solution on Apple where everything "just works" as expected.

prckent avatar Nov 03 '23 21:11 prckent

Not an Accelerate/OpenBLAS issue. Builds differing only by appleclang/gcc-13 fail only for the gcc-13 case.

prckent avatar Nov 08 '23 16:11 prckent

Playing around with this I found that the bus error results from either of the CHECK_THROWS_AS tests in the InputSection::init TEST_CASE in test_InputSection.cpp. With them both commented, all the tests pass with gcc13.2 from macports.

TEST_CASE("InputSection::init", "[estimators]")
{
  SECTION("bad type handling")
  {
    TestInputSection ti;
    //CHECK_THROWS_AS(ti.init({{"full", bool(false)}, {"count", int(15)}, {"width", int(10)}}), UniformCommunicateError);
  }
 ...
  SECTION("invalid type assignment")
  {
    TestInputSection ti;
    //CHECK_THROWS_AS(ti.init({{"full", bool(false)}, {"count", Real(15.)}}), UniformCommunicateError);
  }

prckent avatar Nov 08 '23 19:11 prckent

As far as I can tell gcc 13 does not officially support apple M1 at all. Looking at the homebrew formula its pulling in a unmerged branch from a well know but not official repo. I don't see any reason why we should support it or even look into any further. Use a compiler where support for M1 has actually been merged.

I would suggest we only officially support mainline llvm on osx.

PDoakORNL avatar Nov 10 '23 01:11 PDoakORNL

Noting that this problem still exists on Sequoia 15.3 with macports gcc 13.3.0, 14.2.0. However it does not occur with macports clang 18 or 19 or Apple Clang 16.0.0 (clang-1600.0.26.6).

Unfortunately with all the clangs deterministic-unit_test_utilities_for_testing is also broken in test_NativeInitializerPrint.cpp. The full deterministic test set otherwise passes.

prckent avatar Feb 10 '25 23:02 prckent