kmer prefilter died, crashed in Sequence destructor with memory free checksum error
mmseqs search of a pair of sequences against Colabfold env database fails on Mac ARM with 64 GB of memory with a "prefilter died" message and a Mac crash report showing it crashed in C++ Sequence::~Sequence() destructor in Mac free_list_checksum_botch() and malloc_zone_error() which is associated in other bug reports with "Incorrect checksum for freed object: probably modified after being freed." (https://github.com/apache/arrow/issues/40652).
The crash does not seem to be due to too little memory since the crash also happens using mmseqs option --split-memory-limit 32G on this computer with 64 GB. The crash happens with mmseqs release 18 and release 17.
../mmseqs-r18/bin/mmseqs search msas/prof_res ../colabfold_databases/colabfold_envdb_202108_db msas/res_env msas/tmp3 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --prefilter-mode 0 --k-score 'seq:96,prof:80' >& mmseqs2.out &
Final stack frames from Mac crash dump:
{"imageOffset":37768,"symbol":"__pthread_kill","symbolLocation":8,"imageIndex":1},
{"imageOffset":26764,"symbol":"pthread_kill","symbolLocation":296,"imageIndex":2},
{"imageOffset":494140,"symbol":"abort","symbolLocation":124,"imageIndex":3},
{"imageOffset":57716,"symbol":"malloc_vreport","symbolLocation":892,"imageIndex":4},
{"imageOffset":235860,"symbol":"malloc_zone_error","symbolLocation":100,"imageIndex":4},
{"imageOffset":114292,"symbol":"free_list_checksum_botch","symbolLocation":40,"imageIndex":4},
{"imageOffset":29176,"symbol":"small_free_list_remove_ptr_no_clear","symbolLocation":964,"imageIndex":4},
{"imageOffset":18076,"symbol":"free_small","symbolLocation":632,"imageIndex":4},
{"imageOffset":1369764,"symbol":"Sequence::~Sequence()","symbolLocation":180,"imageIndex":0},
I've attached the full mmseqs log output and the input mmseqs db files ### (prof_res) and the full Mac crash dump. Colabfold env database is the standard colabfold_envdb_202108.tar.gz (https://wwwuser.gwdg.de/~compbiol/colabfold/colabfold_envdb_202108.tar.gz). The mac is running macOS 15.6.1 model Mac Studio M2 Ultra. I did not see this on Linux, Ubuntu 24.04 with 64 GB of memory, although possibly its malloc implementation does not do the malloc free list checksum detection. My guess would be the memory corruption is on all platforms but some are more tolerant.
Hello,
We tried reproducing the issue on a MacOS ARM64 system as well as an Ubuntu24.04 system with an Intel x86_64 CPU and so far we have been unsuccessful as the search terminated without issues on both setups.
Could you please tell us how you built/installed your mmseqs binary?
The mmseqs binary I used is the release 18 precompiled distribution from the mmseqs2 github
https://github.com/soedinglab/MMseqs2/releases/download/18-8cc5c/mmseqs-osx-universal.tar.gz
It may be that the crash is due to a multi-threading issue that depends on the number of cores on the machine. The Mac I used is a Mac Studio M2 Ultra with 24 cores and 64 GB of memory. It reproduces every time, about 5 tries, on this machine. But if it is caused by order of execution in multiple threads it would not be surprising that it would not reproduce on a Mac with different hardware.
I may be able to debug this a bit further. The mmseqs search is running mmseqs prefilter I think 18 or 19 times before it crashes. I could attempt to isolate the test case to that final mmseqs prefilter run that fails. That would reduce the runtime (I recall it takes tens of minutes before it reaches the crash) and then I could try running single threaded to see if it still crashes or if it is sensitive to how many threads are run. In any case with a heap corruption bug in a multi-threaded code it is likely to be a nightmare to try to find the off-by-one array index or non-atomic operation locking problem behind this.
The reason I was running this on a Mac at all is because I develop ChimeraX and am attempting to allow our 20,000 users to do structure prediction with Boltz on their normal work computers using locally computed sequence alignments to avoid the frequent Colabfold server bottlenecks in running large numbers of predictions.
As I noted in my original description I do not observe the crash on Ubuntu 24.04, only on the Mac.
We are running out of ideas to reproduce the issue on our side. If you have time, could you please try to compile and run with Address Sanitizer-enabled MMseqs2?
There are instructions in the wiki https://github.com/soedinglab/MMseqs2/wiki/MMseqs2-Developer-Guide#sanitizers on how to compile with ASan.
I tried building mmseqs2 on my Mac following the instructions in the user guide (without ASan)
https://mmseqs.com/latest/userguide.pdf
brew install cmake libomp zlib bzip2
./util/build_osx.sh ~/ucsf/MMseqs2 ~/ucsf/mmseqs2_build >& build.txt
and it failed with missing symbol __kmpc_dispatch_deinit
[100%] Linking CXX executable mmseqs
Undefined symbols for architecture x86_64:
"___kmpc_dispatch_deinit", referenced from:
Alignment::run(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, unsigned long, unsigned long, bool) (.omp_outlined) in libmmseqs-framework.a[2](Alignment.cpp.o)
doRescorediagonal(Parameters&, DBWriter&, DBReader<unsigned int>&, unsigned long, unsigned long) (.omp_outlined) in libmmseqs-framework.a[11](rescorediagonal.cpp.o)
fwbw(int, char const**, Command const&) (.omp_outlined) in libmmseqs-framework.a[12](Fwbw.cpp.o)
AlignmentSymmetry::readInData(DBReader<unsigned int>*, DBReader<unsigned int>*, unsigned int**, unsigned short**, int, unsigned long*) (.omp_outlined) in libmmseqs-framework.a[14](AlignmentSymmetry.cpp.o)
AlignmentSymmetry::findMissingLinks(unsigned int**, unsigned long*, unsigned long, int) (.omp_outlined) in libmmseqs-framework.a[14](AlignmentSymmetry.cpp.o)
AlignmentSymmetry::sortElements(unsigned int**, unsigned long*, unsigned long) (.omp_outlined) in libmmseqs-framework.a[14](AlignmentSymmetry.cpp.o)
ClusteringAlgorithms::execute(int) (.omp_outlined) in libmmseqs-framework.a[16](ClusteringAlgorithms.cpp.o)
...
ld: symbol(s) not found for architecture x86_64
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [src/mmseqs] Error 1
make[1]: *** [src/CMakeFiles/mmseqs.dir/all] Error 2
make: *** [all] Error 2
Full output build.txt is attached. My Mac has an M2 CPU and if I eliminate the x86_64 build from build_osx.sh and just do the ARM build it fails with the same missing symbol __kmpc_dispatch_deinit.
The brew installed libomp is version 21.1.3. MacOS version is 15.6.1. XCode version is 15.1. clang version 17.0.0.
$ brew install libomp Warning: libomp 21.1.3 is already installed and up-to-date. To reinstall 21.1.3, run: brew reinstall libomp
I don't mind helping out by building and testing with ASan if the build worked.
I think the issue should be fixed in latest git. However, we just merged in a large change into MMseqs2, so please use a git commit from a couple days ago:
git clone https://github.com/soedinglab/MMseqs2.git
cd MMseqs2
git checkout befcb1130a73ddba247fe3eaa504e7ffb790d150
mkdir build
cd build
cmake -DHAVE_SANITIZER=1 -DCMAKE_BUILD_TYPE=ASan ..
make -j $(sysctl -n hw.ncpu)
This should work (hopefully).
Forgot to mention I was building release 18 of mmseqs not the current code when I got the symbol not found error.
My plan was to reproduce the crash in my local build, then enable ASan.
Did not want to immediately jump to newer code since the bug is hard to reproduce and may depend on thread order of execution which any code updates may disturb.
The instructions above should work with 18, just do git checkout 18-8cc5c instead
I built mmseqs2 release 18-8cc5c as you instructed without OpenMP. It needed an additional cmake option -DREQUIRE_OPENMP=0 but then built with no problem. I ran the crashing mmseqs command with the github mac release 18 to make sure it reproduces. Surprisingly it completed successfully on the first try, but then failed on the second try. Each run took about 20 minutes. Then I started the exact same command using my locally compiled mmseqs 18-8cc5c with ASan. Surprisingly while the github prebuilt version starts off with "Process prefiltering step 1 of 18", the ASan local build starts off with "Process prefiltering step 1 of 10". Every other parameter listed in the mmseqs output is identical. So I don't understand why it is running 10 splits instead of 18. At any rate, the first of 3 x 18 "counting k-mers" in the github build took 10 seconds to complete while my locally built ASan mmseqs has been running the first of 3 x 10 "counting k-mers" for 2 hours and the progress bar looks to be about 95%. So ASan mmseqs appear to be running 400 times slower, which suggests it will take 8000 minutes (= 20 min * 400) or about 5.5 days to finish. CPU utilization is only 100% (1 core) and memory use is modest (21 GB), so I'll let it run and see what happens.
LIBOMP=$(brew --prefix libomp)
cmake -DHAVE_SANITIZER=1 -DCMAKE_BUILD_TYPE=ASan \
-DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp -I${LIBOMP}/include" -DOpenMP_C_LIB_NAMES=omp -DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp -I${LIBOMP}/include" -DOpenMP_CXX_LIB_NAMES=omp -DOpenMP_omp_LIBRARY=${LIBOMP}/lib/libomp.a ..
This should work for multithreaded compilation with AppleClang
The splits are decided by the amount of available RAM at the start of program execution. I guess the ASan build reserves more memory and then MMseqs2 decides it needs to do more splits since there is less RAM available
Your suggested compile with LIBOMP=... did not work, cmake succeeds, make fails with
[ 34%] Building CXX object src/CMakeFiles/mmseqs-framework.dir/alignment/Alignment.cpp.o In file included from /Users/goddard/ucsf/MMseqs2/src/alignment/Alignment.cpp:1: In file included from /Users/goddard/ucsf/MMseqs2/src/alignment/Alignment.h:4: In file included from /Users/goddard/ucsf/MMseqs2/src/commons/IndexReader.h:6: In file included from /Users/goddard/ucsf/MMseqs2/src/prefiltering/PrefilteringIndexReader.h:5: In file included from /Users/goddard/ucsf/MMseqs2/src/prefiltering/IndexTable.h:19: In file included from /Users/goddard/ucsf/MMseqs2/src/commons/FastSort.h:9: In file included from /Users/goddard/ucsf/MMseqs2/lib/ips4o/ips4o.hpp:38: In file included from /Users/goddard/ucsf/MMseqs2/lib/ips4o/ips4o/ips4o.hpp:44: In file included from /Users/goddard/ucsf/MMseqs2/lib/ips4o/ips4o/config.hpp:45: /Users/goddard/ucsf/MMseqs2/lib/ips4o/ips4o/thread_pool.hpp:38:10: fatal error: 'omp.h' file not found 38 | #include <omp.h>
The file $(LIBOMP)/include/omp.h exists but I guess the include with angle brackets <omp.h> is only looking in system locations so does not find it. I'm not familiar enough with cmake to quickly find the magic to fix this.
Your explanation about the number of splits does not make sense. If less RAM was available there would be more splits not fewer. Maybe the estimated memory use depends on the number of threads. Since the non-openmp compile I guess cannot use multiple threads maybe that is why it decides it can use fewer splits. Both the github binary and my local binary estimate 45 GB of memory use using 18 splits and 10 splits respectively.
The single threaded ASan crash test has been running for 1 day and is about 20% done as estimated. I think it is probably a waste of time because the memory corrupting that causes this crash is very likely a thread safety problem which won't manifest with only a single thread.
Ah, yes I misread what you wrote. We reserve quite a bit of thread memory, so your explanation makes much more sense.
Again, I sadly cannot reproduce the compilation issue :( It seems to work here fine. Can you try to compile with make VERBOSE=1 and paste the compiler call that is failing here?
My mistake. Verbose indicated no -I compile option for libomp and I see cmake didn't get LIBOMP being the path to the homebrew openmp. I must have mistyped something. I reran the cmake and it got the homebrew location and mmseqs compiled. I've started the prefilter crash test with ASan and openmp now running with 24 threads, and it is using 18 splits exactly as in the non-ASan run, so hopefully we will get a crash with better debugging info.
Tried to reproduce the mmseqs prefilter crash with ASan and release 18 code locally compiled, unfortunately it ran successfully, took 36 hours. I will try compiling locally without ASan and seeing if that crashes.
I compiled release 18 mmseqs2 with openmp but without ASan on the Mac Studio used in all the above tests I describe and ran the crash test two times and both completed successfully. It seems only the Mac mmseqs release 18 binary on github exhibits the crash. I think we are at a dead-end with this ticket and probably it should be closed.
Unfortunately when I use my locally compiled Mac mmseqs2 release 18 on a colabfold_search run with 476 query sequences it again crashes in prefilter of the env database. I'm afraid I've spent too much time trying to debug mmseqs on Mac, so I am giving up on it for now. It crashed after about 3 hours without ASan, so probably not feasible to run under ASan.
You may have better luck reproducing this crash trying this colabfold_search heavier use of mmseqs on Mac. I'll attach the 476 query sequences. Here is the colabfold_search command I used
colabfold_search mgen_476.fasta ../colabfold_databases msas --mmseqs ../mmseqs --merge-a3m 0 --pre-pairing >& colabfold_search.out
It is using some modifications I used to colabfold_search python code, the merge-a3m and pre-pairing options but those should not effect the crashing. I use the standard colabfold uniref and env databases. The attached file contains the sequences mgen_476.fasta and the output from colabfold_search that ends in the mmseqs prefilter crash.