Relion4 Class2d problems on M1 Mac
I can compile and run Relion4 on my ARM-based MacBook Pro with an OpenMP-related change to the CMakeLists.txt, but Class2D either deadlocks (VDAM) or crashes with a memory allocation error (EM). Info below. Only CPU code, of course. James Conway, U Pittsburgh: [email protected]
1. System info MacBook Pro 16-inch, 2021, M1 Max, 64 GBytes RAM. Mac OSX 12.5.1 (current), current XCode 13.4.1 Additional tools by HomeBrew:
- XQuartz 2.8.2
- gcc 12.2.0
- cmake 3.24.1
- open-mpi 4.1.4
- fltk 1.3.8
- fftw 3.3.10
- libomp 14.0.6
2. Symbols defined (csh) setenv CXX g++-12 setenv CC gcc-12 setenv OMPI_CXX g++-12 setenv OMPI_CC gcc-12 setenv PATH "/opt/homebrew/opt/openmpi/bin:${PATH}" setenv CXXFLAGS "-I/opt/homebrew/opt/openmpi/include" setenv LDFLAGS "-L/opt/homebrew/opt/openmpi/lib"
3. Changes to CMakeLists.txt to enable openMP This inserts between the OpenMPI block and the Intel Compiler support block:
# ----------------------------------------------------------------------------OpenMP-- James Conway
# This block from: https://code-examples.net/en/q/10d10e9
# Use of -Xpreprocessor suggested here: https://stackoverflow.com/questions/40095958/apple-clang-fopenmp-not-working
# Still a linker problem with OMP: ld: symbol(s) not found for architecture arm64
OPTION (USE_OpenMP "Use OpenMP to enamble <omp.h>" ON)
# Find OpenMP
if(APPLE AND USE_OpenMP)
if(CMAKE_C_COMPILER_ID MATCHES "Clang")
set(OpenMP_C "${CMAKE_C_COMPILER}")
set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument")
set(OpenMP_C_LIB_NAMES "libomp" "libgomp" "libiomp5")
set(OpenMP_libomp_LIBRARY ${OpenMP_C_LIB_NAMES})
set(OpenMP_libgomp_LIBRARY ${OpenMP_C_LIB_NAMES})
set(OpenMP_libiomp5_LIBRARY ${OpenMP_C_LIB_NAMES})
endif()
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
set(OpenMP_CXX "${CMAKE_CXX_COMPILER}")
set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument")
set(OpenMP_CXX_LIB_NAMES "libomp" "libgomp" "libiomp5")
set(OpenMP_libomp_LIBRARY ${OpenMP_CXX_LIB_NAMES})
set(OpenMP_libgomp_LIBRARY ${OpenMP_CXX_LIB_NAMES})
set(OpenMP_libiomp5_LIBRARY ${OpenMP_CXX_LIB_NAMES})
endif()
endif()
if(USE_OpenMP)
find_package(OpenMP REQUIRED)
endif(USE_OpenMP)
if (OPENMP_FOUND)
# include_directories("${OPENMP_INCLUDES},/opt/homebrew/include")
# link_directories("${OPENMP_LIBRARIES},/opt/homebrew/lib")
include_directories("/opt/homebrew/include")
link_directories("/opt/homebrew/lib")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
# set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif(OPENMP_FOUND)
4. Compiling Relion4 (4.0-beta-2-commit-3b1752) with the CMakeLists.txt modified as above git clone https://github.com/3dem/relion.git cd relion git checkout 4.0 mkdir -p build cd build cmake .. make -j 6 make install
5. Running Relion4 (4.0-beta-2-commit-3b1752) on the Tutorial beta-galactosidase dataset Import, Motion correction - no problem CtfEstimation - has to be done elsewhere because I can't get CTFFIND4 to compile AutoPick, Extract - no problem -- Extract: 5793 particles, 256x256 scaled to 64x64 (3.54 A/pixel) Class2D - deadlocks (VDAM) or crashes (EM) as described below
6. Class2D - VDAM with 1 MPI (required) and 2 threads This hangs with an apparent deadlock, both with 2 or 1 threads:
which relion_refine--o Class2D/job010/run --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --preread_images --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 2 --pipeline_control Class2D/job010/
Running CPU instructions in double precision.
Initial subset size set to 200
Final subset size set to 1000
Estimating initial noise spectra
0/ 0 sec ............................................................~~(,_,">
Estimating accuracies in the orientational assignment ...
0/ 0 sec ............................................................~~(,_,">
Auto-refine: Estimated accuracy angles= 17.9 degrees; offsets= 9.912 Angstroms
CurrentResolution= 56.64 Angstroms, which requires orientationSampling of at least 30 degrees for a particle of diameter 200 Angstroms
Oversampling= 0 NrHiddenVariableSamplingPoints= 31500
OrientationalSampling= 12 NrOrientations= 30
TranslationalSampling= 7.08 NrTranslations= 21
=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 1008000
OrientationalSampling= 6 NrOrientations= 240
TranslationalSampling= 3.54 NrTranslations= 84
=============================
Gradient optimisation iteration 1 of 200 with 200 particles (Step size 0.9)
000/??? sec ~~(,_,"> [oo]
This process never progresses. Sampling it seems to show its in a deadlock:
~ 2.645s Thread 16621156 DispatchQueue_1: com.apple.main-thread (serial)
~ 2.645s start (in dyld) + 520 [0x100f9908c]
~ 2.645s main (in relion_refine) + 80 [0x100b7d550]
~ 2.645s MlOptimiser::iterate() (in relion_refine) + 340 [0x100b2c954]
~ 2.645s M1Optimiser: : expectation) (in relion_refine) + 844 [0x100b109e07
~ 2.645s M1Optimiser: : expectationSomeParticles(long, long) (in relion_refine) + 1124 [0x100b0f6b4]
~ 2.645s GOMP_parallel (in libgomp.1.dylib) + 84 [0x1012f3c74]
~ 2.645s M1Optimiser: :expectationSomeParticles(long, long) (._omp_fn.0) (in relion_refine) + 100 [0x100b1df48]
~ 2.645s globalThreadExpectationSomeParticles(void*, int) (in relion_refine) + 100 [0x100b1de34]
~ 2.645s MlOptimiser:: expectationOneParticle(long, int) (in relion_refine) + 1824 [0x100b1cdf4]
~ 2.645s M1Optimiser: : storeWeightedSums(long, int, int, int, int, int, int, int, int, int, int, int, std: :vector<double, std: :allocator<double> >&, st
~ 2.645s _pthread_mutex_firstfit_lock_slow (in libsystempthread.dylib) + 248 [0x193a36cf8]
~ 2.645s _pthread_mutex_firstfit_lock_wait (in libsystempthread.dylib) + 84 [0x193a39384]
~ 2.645s __psynch_mutexwait (in libsystem_kernel.dylib) + 8 [0x193a01738]
~ 2.645s Thread 16621171
~ 2.645s thread_start (in libsystempthread.dylib) + 8 [0x193a3708c]
~ 2.645s _pthread_start (in libsystempthread.dylib) + 148 [0x193a3c26c]
~ 2.645s gomp_thread_start (in libgomp.1.dylib) + 308 [0x1012fa9b8]
~ 2.645s M1Optimiser: : expectationSomeParticles(long, long) (._omp_fn.0) (in relion_refine) + 100 [0x100b1df48]
~ 2.645s global ThreadExpectationSomeParticles(void*, int) (in relion_refine) + 56 [0x100b1de08]
~ 2.645s ParallelTaskDistributor: :getTasks(unsigned long&, unsigned long&) (in relion_refine) + 60 [0x100b3348c]
~ 2.645s _pthread_mutex_fairshare_lock_slow (in libsystem_pthread.dylib) + 196 [0x193a3f308]
~ 2.645s _pthread _mutex_fairshare_lock_wait (in libsystempthread.dylib) + 84 [0x193a3f3b07
~ 2.6455 __psych_mutexwait (in libsystem_kernel.dylib) + 8 [0x193a01738]
If I repeat with just one thread, the result is the same:
~ 2.651s Thread 16603999 DispatchQueue_1: com.apple.main-thread (serial)
~ 2.651s start (in dyld) + 520 [0x1046fd08c]
~ 2.651s main (in relion_refine) + 80 [0x104515550]
~ 2.651s Ml0ptimiser: :iterate() (in relion_refine) + 340 [0x1044c4954]
~ 2.651s M1Optimiser: : expectation) (in relion_refine) + 844 [0x1044a89€07
~ 2.651s M1Optimiser: : expectationSomeParticles(long, long) (in relion_refine) + 1124 [0x1044a76b4]
~ 2.651s GOMP_parallel (in libgomp.1.dylib) + 84 [0x104c83c74]
~ 2.651s M1Optimiser: : expectationSomeParticles(long, long) (._omp_fn.0) (in relion_refine) + 100 [0x1044b5f48]
~ 2.651s globalThreadExpectationSomeParticles(void*, int) (in relion_refine) + 56 [0x1044b5e08]
~ 2.651s ParallelTaskDistributor::getTasks(unsigned long&, unsigned long&) (in relion_refine) + 60 [0x1044cb48c]
~ 2.651s _thread_mutex_firstfit_lock_slow (in libsystem_pthread.dylib) + 248 [0x193a36cf8]
~ 2.651s _pthread_mutex_firstfit_lock_wait (in libsystempthread.dylib) + 84 [0x193a39384]
~ 2.651s __psynch_mutexwait (in libsystem_kernel.dylib) + 8 [0x193a017387
7. Class2D - EM This crashes with a malloc error:
which relion_refine_mpi--o Class2D/job012/run --iter 25 --i Extract/job009/particles.star --dont_combine_weights_via_disc --preread_images --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 2 --pipeline_control Class2D/job012/
RELION version: 4.0-beta-2-commit-3b1752
Precision: BASE=double
=== RELION MPI setup ===
+ Number of MPI processes = 2
+ Number of threads per MPI process = 2
+ Total number of threads therefore = 4
+ Leader (0) runs on host = JFC-MacBookPro-2022
=================
+ Follower 1 runs on host = JFC-MacBookPro-2022
Running CPU instructions in double precision.
Estimating initial noise spectra
0/ 0 sec ............................................................~~(,_,">
[JFC-MacBookPro-2022:36584] *** Process received signal ***
[JFC-MacBookPro-2022:36584] Signal: Abort trap: 6 (6)
[JFC-MacBookPro-2022:36584] Signal code: (0)
[JFC-MacBookPro-2022:36584] [ 0] 0 libsystem_platform.dylib 0x0000000193a534a4 _sigtramp + 56
[JFC-MacBookPro-2022:36584] [ 1] 0 libsystem_pthread.dylib 0x0000000193a3bee0 pthread_kill + 288
[JFC-MacBookPro-2022:36584] [ 2] 0 libsystem_c.dylib 0x0000000193976340 abort + 168
[JFC-MacBookPro-2022:36584] [ 3] 0 libsystem_malloc.dylib 0x00000001938588c0 has_default_zone0 + 0
[JFC-MacBookPro-2022:36584] [ 4] 0 libsystem_malloc.dylib 0x000000019386dc84 malloc_zone_error + 100
[JFC-MacBookPro-2022:36584] [ 5] 0 libsystem_malloc.dylib 0x000000019384abc4 nanov2_allocate_from_block + 568
[JFC-MacBookPro-2022:36584] [ 6] 0 libsystem_malloc.dylib 0x000000019384a1e0 nanov2_allocate + 128
[JFC-MacBookPro-2022:36584] [ 7] 0 libsystem_malloc.dylib 0x000000019384a0fc nanov2_malloc + 64
[JFC-MacBookPro-2022:36584] [ 8] 0 libsystem_malloc.dylib 0x0000000193867748 _malloc_zone_malloc + 156
[JFC-MacBookPro-2022:36584] [ 9] 0 relion_refine_mpi 0x0000000104a071dc _ZN14MlOptimiserMpi11expectationEv + 124
[JFC-MacBookPro-2022:36584] [10] 0 relion_refine_mpi 0x0000000104a1ff74 _ZN14MlOptimiserMpi7iterateEv + 2052
[JFC-MacBookPro-2022:36584] [11] 0 relion_refine_mpi 0x0000000104a7171c main + 92
[JFC-MacBookPro-2022:36584] [12] 0 dyld 0x0000000104e0108c start + 520
[JFC-MacBookPro-2022:36584] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node JFC-MacBookPro-2022 exited on signal 6 (Abort trap: 6).
--------------------------------------------------------------------------
Thank you very much for the details.
This is interesting. It is not obvious at the moment why these issues arise only on Mac OS platforms.
I do have a M1 Mac Book Pro so I will look into it when I have time. (But I cannot promise ETA! I am afraid to say this is a low priority.)
On my computer, I managed to compile without tweaking CMakeList.txt.
- Mac Book Pro 16 (M1 Max, 2021)
- macOS 12.5.1
Installed the followings from HomeBrew.
- gcc-12 (Homebrew GCC 12.2.0) 12.2.0
- cmake version 3.24.1
- mpirun (Open MPI) 4.1.4
- fftw 3.3.10
- libomp 14.0.6
export CXX=g++-12
export CC=gcc-12
export OMPI_CXX=g++-12
export OMPI_CC=gcc-12
cmake .. -DGUI=OFF -DFETCH_TORCH_MODELS=OFF -DCMAKE_BUILD_TYPE=DEBUG
make
Oops, gdb does not support M1 Mac...
I have to learn lldb.
On my computer, the EM algorithm does not crash due to malloc but dead locks.
In lldb:
(lldb) r --o Class2D/em --iter 25 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 20 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 1
(Ctrl-C after it hangs)
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00000001c07ed738 libsystem_kernel.dylib`__psynch_mutexwait + 8
frame #1: 0x00000001c0825384 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84
frame #2: 0x00000001c0822cf8 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 248
frame #3: 0x000000010018a61c relion_refine`MlOptimiser::storeWeightedSums(this=0x000000016fdfd180, part_id=9142, ibody=0, exp_current_oversampling=1, metadata_offset=0, exp_idir_min=0, exp_idir_max=0, exp_ipsi_min=0, exp_ipsi_max=29, exp_itrans_min=0, exp_itrans_max=20, exp_iclass_min=0, exp_iclass_max=19, exp_min_diff2=0x000000016fdfc7a8, exp_highres_Xi2_img=0x000000016fdfc7c0, exp_Fimg=0x000000016fdfc8f8, exp_Fimg_nomask=0x000000016fdfc8e0, exp_Fctf=0x000000016fdfc898, exp_power_img=0x000000016fdfc700, exp_old_offset=0x000000016fdfc748, exp_prior=0x000000016fdfc730, exp_Mweight=0x000000016fdfc5d0, exp_Mcoarse_significant=0x000000016fdfc668, exp_significant_weight=0x000000016fdfc778, exp_sum_weight=0x000000016fdfc790, exp_max_weight=0x000000016fdfc760, exp_pointer_dir_nonzeroprior=0x000000016fdfc838, exp_pointer_psi_nonzeroprior=0x000000016fdfc820, exp_directions_prior=0x000000016fdfc808, exp_psi_prior=0x000000016fdfc7f0, exp_local_Fimgs_shifted=0x000000016fdfc8c8, exp_local_Fimgs_shifted_nomask=0x000000016fdfc8b0, exp_local_Minvsigma2=0x000000016fdfc868, exp_local_Fctf=0x000000016fdfc880, exp_local_sqrtXi2=0x000000016fdfc7d8, exp_STMulti=0x000000016fdfc850) at ml_optimiser.cpp:8514:21
frame #4: 0x000000010017a3f4 relion_refine`MlOptimiser::expectationOneParticle(this=0x000000016fdfd180, part_id_sorted=0, thread_id=0) at ml_optimiser.cpp:4326:20
frame #5: 0x0000000100179ac8 relion_refine`MlOptimiser::doThreadExpectationSomeParticles(this=0x000000016fdfd180, thread_id=0) at ml_optimiser.cpp:4006:26
frame #6: 0x000000010014a95c relion_refine`globalThreadExpectationSomeParticles(self=0x000000016fdfd180, thread_id=0) at ml_optimiser.cpp:79:40
frame #7: 0x00000001001ca86c relion_refine`_ZN11MlOptimiser24expectationSomeParticlesEll._omp_fn.0((null)=0x000000016fdfce30) at ml_optimiser.cpp:3934:40
frame #8: 0x0000000100c97c74 libgomp.1.dylib`GOMP_parallel + 84
frame #9: 0x0000000100179978 relion_refine`MlOptimiser::expectationSomeParticles(this=0x000000016fdfd180, my_first_part_id=0, my_last_part_id=29) at ml_optimiser.cpp:3932:11
frame #10: 0x0000000100177bfc relion_refine`MlOptimiser::expectation(this=0x000000016fdfd180) at ml_optimiser.cpp:3388:27
frame #11: 0x0000000100176ef0 relion_refine`MlOptimiser::iterate(this=0x000000016fdfd180) at ml_optimiser.cpp:3013:14
frame #12: 0x000000010000a298 relion_refine`main(argc=34, argv=0x000000016fdff210) at refine.cpp:39:20
frame #13: 0x00000001005f908c dyld`start + 520
It is https://github.com/3dem/relion/blob/ver4.0/src/ml_optimiser.cpp#L8514.
This is really puzzling. The above is the only place global_mutex2 is locked. With only one thread, it cannot dead lock...
I also confirmed it is initialized properly before.
thread list confirms there is only one thread.
VDAM is stuck at the same place, MlOptimiser::storeWeightedSums, unlike the initial report of ParallelTaskDistributor::getTasks.
(lldb) r --o Class2D/vdam --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12--j 1
Repeating tests, I got a dead lock in ParallelTaskDistributor::getTasks.
Somehow OpenMP's mutex lock is not working as expected, but I don't know what is the cause.
On my computer, I managed to compile without tweaking CMakeList.txt.
gcc-12 (Homebrew GCC 12.2.0) 12.2.0 cmake version 3.24.1 mpirun (Open MPI) 4.1.4 fftw 3.3.10 libomp 14.0.6 cmake .. -DGUI=OFF -DFETCH_TORCH_MODELS=OFF make
I didn't use -DGUI=OFF or -DFETCH_TORCH_MODELS=OFF The GUI worked ok, and I tend to use Relion that way. The torch stuff doesn't seem to make a difference.
On my computer, the EM algorithm does not crash due to malloc but dead locks. In lldb: ... It is probably https://github.com/3dem/relion/blob/ver4.0/src/ml_optimiser.cpp#L8514.
This is really puzzling. The above is the only place global_mutex2 is locked. With only one thread, it cannot dead lock...
That is the most curious point.
I also confirmed it is initialized properly before.
VDAM is stuck at the same place, MlOptimiser::storeWeightedSums, unlike the initial report of ParallelTaskDistributor::getTasks.
(lldb) r --o Class2D/vdam --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12--j 1
The sampler was in MlOptimiser::storeWeightedSums with a deadlock.
Repeating tests, I got a dead lock in ParallelTaskDistributor::getTasks. Somehow OpenMP's mutex lock is not working as expected, but I don't know what is the cause.
You managed to compile without the additional flags for OpenMP in CMakeLists.txt that I introduced (from internet sleuthing) and still a problem there. Hard to believe that OpenMP is broken on the Macs, maybe I will try this on an Intel Mac just for comparison.
Thanks for your efforts.
James Conway
I didn't use -DGUI=OFF or -DFETCH_TORCH_MODELS=OFF The GUI worked ok, and I tend to use Relion that way. The torch stuff doesn't seem to make a difference.
This was only to save time.
Hard to believe that OpenMP is broken on the Macs, maybe I will try this on an Intel Mac just for comparison.
Indeed a simple program to initialize several locks and get and release them worked fine.
As a possible workaround, things seem to run using llvm from brew (in my case /opt/homebrew/Cellar/llvm/15.0.1) instead of gcc-12.
The real problem, at least in my system, seems to be multiple omp.h headers.
brew's libomp, which is the llvm OpenMP library, installs omp.h in /opt/homebrew/include/omp.h.
gcc installs omp.h somewhere hidden like /opt/homebrew/Cellar/gcc/12.2.0/lib/gcc/current/gcc/aarch64-apple-darwin21/12/include/omp.h.
So if you have libomp installed in your system you will include its header files even when compiling with gcc (which implicitly links with libgomp). It also happens that libomp has a different sizeof(omp_lock_t) than gcc's libgomp (resulting in memory corruption and all kinds of weird behaviour).
Try to uninstall libomp and make a fresh compilation.
The following code should be able to detect such problems if placed after including omp.h
#if defined(__APPLE__) && defined(__GNUC__)
#ifndef _LIBGOMP_OMP_LOCK_DEFINED
#error "Incompatible omp.h header included! Please make sure you are not using omp.h from libomp."
#endif
#endif
Great investigation!
Unfortunately, some brew packages depend on libomp, so it is not always possible to remove it. Can we somehow ask GCC to pick up the internal one?
You just need to make sure to add -I/opt/homebrew/Cellar/gcc/12.2.0/lib/gcc/current/gcc/aarch64-apple-darwin21/12/include/ (or something like that depending on the exact gcc used) before -I/opt/homebrew/include/ during the compilation step, but I don't know if there's an easy way to make that automatically in cmake.
Hi, I saw @biochem-fan 's tweet about this issue and was curious to find out more.
After installing some packages with brew install open-mpi cmake fltk fftw libomp jpeg and adding the following block into CMakeLists.txt, I successfully compiled Relion4 using only M1 Mac's Apple Clang.
# ----------------------------------------------------------------------------OpenMP-- James Conway
# This block from: https://code-examples.net/en/q/10d10e9
# Use of -Xpreprocessor suggested here: https://stackoverflow.com/questions/40095958/apple-clang-fopenmp-not-working
OPTION (USE_OpenMP "Use OpenMP to enable <omp.h>" ON)
# Find OpenMP
if(APPLE AND USE_OpenMP)
execute_process(COMMAND brew --prefix libomp
OUTPUT_VARIABLE OpenMP_HOME
OUTPUT_STRIP_TRAILING_WHITESPACE)
message(STATUS "OpenMP Root : ${OpenMP_HOME}")
set(OpenMP_libomp_LIBRARY "${OpenMP_HOME}/lib")
if(CMAKE_C_COMPILER_ID MATCHES "Clang")
set(OpenMP_C "${CMAKE_C_COMPILER}")
set(OpenMP_C_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OpenMP_HOME}/include -lomp -L${OpenMP_libomp_LIBRARY}" CACHE STRING "" FORCE)
set(OpenMP_C_LIB_NAMES "libomp")
endif()
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
set(OpenMP_CXX "${CMAKE_CXX_COMPILER}")
set(OpenMP_CXX_FLAGS "-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OpenMP_HOME}/include -lomp -L${OpenMP_libomp_LIBRARY}" CACHE STRING "" FORCE)
set(OpenMP_CXX_LIB_NAMES "libomp")
endif()
endif()
if(USE_OpenMP)
find_package(OpenMP REQUIRED)
endif(USE_OpenMP)
if (OpenMP_FOUND)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
endif(OpenMP_FOUND)
Then,
mkdir -p build ; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=${HOME}/apps/relion/4.0 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang -DGUI=OFF -DFETCH_TORCH_MODELS=OFF -DCMAKE_BUILD_TYPE=DEBUG
make -j8 install
Relion 4.0 was installed on my M1 Mac.
My test was here:
$ wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion40_tutorial_precalculated_results.tar.gz
$ tar zxvf relion40_tutorial_precalculated_results.tar.gz
$ cd relion40_tutorial_precalculated_results
$ mkdir -p Class2D/vdam
$ ~/apps/relion/4.0/bin/relion_refine --o Class2D/vdam /test --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job012/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --
oversampling 1 --psi_step 12--j 1
Running CPU instructions in double precision.
Initial subset size set to 200
Final subset size set to 1000
Estimating initial noise spectra from 1000 particles
0/ 0 sec ............................................................~~(,_,">
Estimating accuracies in the orientational assignment ...
2/ 2 sec ............................................................~~(,_,">
Auto-refine: Estimated accuracy angles= 30.1 degrees; offsets= 14.868 Angstroms
CurrentResolution= 56.64 Angstroms, which requires orientationSampling of at least 30 degrees for a particle of diameter 200 Angstroms
Oversampling= 0 NrHiddenVariableSamplingPoints= 43500
OrientationalSampling= 12 NrOrientations= 30
TranslationalSampling= 7.08 NrTranslations= 29
=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 1392000
OrientationalSampling= 6 NrOrientations= 240
TranslationalSampling= 3.54 NrTranslations= 116
=============================
Gradient optimisation iteration 1 of 200 with 200 particles (Step size 0.9)
41/ 41 sec ............................................................~~(,_,">
Maximization ...
0/ 0 sec ............................................................~~(,_,">
CurrentResolution= 45.312 Angstroms, which requires orientationSampling of at least 25.7143 degrees for a particle of diameter 200 Angstroms
Oversampling= 0 NrHiddenVariableSamplingPoints= 43500
OrientationalSampling= 12 NrOrientations= 30
TranslationalSampling= 7.08 NrTranslations= 29
=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 1392000
OrientationalSampling= 6 NrOrientations= 240
TranslationalSampling= 3.54 NrTranslations= 116
=============================
Gradient optimisation iteration 2 of 200 with 200 particles (Step size 0.9)
49/ 49 sec ............................................................~~(,_,">
Maximization ...
0/ 0 sec ............................................................~~(,_,">
CurrentResolution= 45.312 Angstroms, which requires orientationSampling of at least 25.7143 degrees for a particle of diameter 200 Angstroms
Oversampling= 0 NrHiddenVariableSamplingPoints= 43500
OrientationalSampling= 12 NrOrientations= 30
TranslationalSampling= 7.08 NrTranslations= 29
=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 1392000
OrientationalSampling= 6 NrOrientations= 240
TranslationalSampling= 3.54 NrTranslations= 116
=============================
Gradient optimisation iteration 3 of 200 with 200 particles (Step size 0.9)
48/ 48 sec ............................................................~~(,_,">
Maximization ...
0/ 0 sec ............................................................~~(,_,">
It seems the deadlock was solved.
- System Info
- macOS 12.6 (current), XCode 14.0 (14A309),
- Apple clang version 14.0.0 (clang-1400.0.29.102) Target: arm64-apple-darwin21.6.0
@YoshitakaMo The deadlock only happens with gcc.
The following hack:
if(APPLE AND CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
get_property(dirs DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} PROPERTY INCLUDE_DIRECTORIES)
list(FIND dirs "/opt/homebrew/include" index)
if(${index} GREATER -1)
if(EXISTS "/opt/homebrew/include/omp.h")
# If omp.h from libomp exists in the include path and we're using gcc
# move it to the end of the include path search list to ensure
# we get omp.h from gcc's internal headers
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -idirafter /opt/homebrew/include")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -idirafter /opt/homebrew/include")
endif()
endif()
endif(APPLE AND CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
when added after https://github.com/3dem/relion/blob/1569f02f26b065459c1ee3b4ea4186c228acb70e/src/apps/CMakeLists.txt#L443 fixes the issue, but it's not very pretty...
I created a minimum working example of the Homebrew libomp's GCC incompatibility issue.
https://gist.github.com/biochem-fan/31864239460769d2a4a3585e4959d298
As discussed in Homebrew-core repo, if homebrew's libomp is modified to keg-only, we can handle which openmp library should be used combined with environment variables such as C(XX)FLAGS.
@YoshitakaMo I guess gcc and llvm from brew won't need CFLAGS, because their own OpenMP headers are located in their standard search path (e.g. /opt/homebrew/Cellar/gcc/12.2.0/lib/gcc/current/gcc/aarch64-apple-darwin21/12/include/omp.h, /opt/homebrew/Cellar/llvm/15.0.1/lib/clang/15.0.1/include/omp.h).
The combination of AppleClang (from Xcode) and libomp from brew needs an explicit path.
In an ideal world FindOpenMP.cmake would set OpenMP_C/CXX_INCLUDE_DIRS appropriately so we would only need to add that to include_directories, but this would definitely need to be tested (and I don't have much faith it will work "out of the box").
To compile with Apple Clang + libomp from HomeBrew without patching CMakeLists.txt:
brew install libomp
OMPPATH=`brew --prefix libomp`
cmake .. -DGUI=NO -DFETCH_TORCH_MODELS=OFF \
-DCMAKE_C_FLAGS="-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OMPPATH}/include -lomp -L${OMPPATH}/lib" \
-DCMAKE_CXX_FLAGS="-Xpreprocessor -fopenmp -Wno-unused-command-line-argument -I${OMPPATH}/include -lomp -L${OMPPATH}/lib"
make -j6
Tests:
~/prog/relion/build-appleclang/bin/relion_refine --o Class2D/em --iter 25 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 20 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --offset_range 5 --offset_step 2 --norm --scale --j 8
~/prog/relion/build-appleclang/bin/relion_refine --o Class2D/vdam --grad --class_inactivity_threshold 0.1 --grad_write_iter 10 --iter 200 --i Extract/job009/particles.star --dont_combine_weights_via_disc --pool 30 --pad 2 --ctf --tau2_fudge 2 --particle_diameter 200 --K 50 --flatten_solvent --zero_mask --center_classes --oversampling 1 --psi_step 12 --j 8
Both do not dead-lock.
I think this gives sufficient options to write macOS installation instructions, steering people away from gcc for the moment.
Thanks for all the sleuthing. I was also successful setting CC and CXX to the system clang compiler:
setenv CC /usr/bin/gcc
setenv CXX /usr/bin/g++
same with OMPI_CC and OMPI_CXX, and (as above):
setenv OMPPATH brew --prefix libomp
and set these, not sure if they are required:
setenv PATH "/opt/homebrew/opt/openmpi/bin:${PATH}"
setenv CXXFLAGS "-I/opt/homebrew/opt/openmpi/include"
setenv LDFLAGS "-L/opt/homebrew/opt/openmpi/lib"
Then the cmake, as above, and I have tested 2D classification (VDAM) which now completes, no crashes, and the old EM method is running, now passed the previous crash.
Thanks again!
You don't have to set CC, CXX, OMPI_CC, OMPI_CXX etc because gcc is the default and points to AppleClang.
Hi all, I made a Homebrew formula according to the discussion above.
# install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)
# install Relion 4.0.0
brew install wget
wget https://raw.githubusercontent.com/YoshitakaMo/homebrew-bio/addrelion/Formula/relion.rb
brew install ./relion.rb --build-from-source --verbose --keep-tmp
After a few minutes, Relion 4.0.0 will be installed. The deadlock issue is solved on both M1 and Intel mac.
There is an small issue in applying this Formula to the Homebrew repository. Currently, a installed script relion_class_ranker.py is not executable by default. Can this issue be fixed?
$ brew audit --new /opt/homebrew/opt/relion/.brew/relion.rb
relion:
* Non-executables were installed to "/opt/homebrew/opt/relion/bin".
The offending files are:
/opt/homebrew/opt/relion/bin/relion_class_ranker.py
Error: 1 problem in 1 formula detected
Hi all, I made a Homebrew formula according to the discussion above.
# install homebrew /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh) # install Relion 4.0.0 brew install wget wget https://raw.githubusercontent.com/YoshitakaMo/homebrew-bio/addrelion/Formula/relion.rb brew install ./relion.rb --build-from-source --verbose --keep-tmpAfter a few minutes, Relion 4.0.0 will be installed. The deadlock issue is solved on both M1 and Intel mac.
There is an small issue in applying this Formula to the Homebrew repository. Currently, a installed script
relion_class_ranker.pyis not executable by default. Can this issue be fixed?$ brew audit --new /opt/homebrew/opt/relion/.brew/relion.rb relion: * Non-executables were installed to "/opt/homebrew/opt/relion/bin". The offending files are: /opt/homebrew/opt/relion/bin/relion_class_ranker.py Error: 1 problem in 1 formula detected
I was just helping someone and noticed that this .rb file has the GUI turned off. Removing '<< "-DGUI=NO" ' from the relion.rb file allows the GUI to compile and run. Haven't tested if everything works.
@charlie-bond Thank you for your comment. Now I've removed the arg from https://raw.githubusercontent.com/YoshitakaMo/homebrew-bio/addrelion/Formula/relion.rb to allow GUI.
Now Relion 4.0.0 has been available on Homebrew! Just type brew install brewsci/bio/relion on your terminal if Homebrew is installed. It's not GPU-accelerated since Homebrew is not designed for Linux with GPUs, but is very useful for macOS users.