Call segfaulting only on linux
I am not sure the best way to frame this but I am testing a build function here:
import unittest
import bayeswavecpp_bindings.autoload_cppyy
import cppyy.gbl as Cpp # here is where bayeswave functions are loaded
import numpy as np
class runBuilderTests(unittest.TestCase):
def setUp(self):
split_command_line = COMMAND_LINE.split()
# RunBuilder's arguments are (int argc, char** argv),
# e.g. the length of character arrays and a pointer to character arrays
# shockingly, using cppyy this is as easy as passing the list of argument strings and its length
# note, Cpp.std.make_unique is not called on dataBuilder
dataBuilder = Cpp.LalDataBuilder(len(split_command_line), split_command_line)
self.runBuilder = Cpp.RunBuilder(len(split_command_line), split_command_line, Cpp.std.move(dataBuilder))
self.runBuilder.__python_owns__ = False
self.run = self.runBuilder.build()
self.run.__python_owns__ = False
def test_evolve_run(self):
"""
Running bayeswave from python
:return:
"""
print("Testing running full MCMC")
self.run.evolveStateAllCycles()
The same code in C++ looks like this
int main(int argc, char** argv) {
Version::printCodeVersion(std::cout);
if (argc == 2) {
RunBuilder::printHelpMessage();
return 0;
}
auto dataBuilder = std::make_unique<LalDataBuilder>(argc, argv);
RunBuilder runBuilder{argc, argv, std::move(dataBuilder)};
auto run = runBuilder.build();
run->evolveStateAllCycles();
}
When I call the test case from my laptop (Mac) it runs perfectly, but when I call it from a cluster (using linux) the test segfaults when runBuilder.build() is called.
Inside of runBuilder the call looks like this:
RunBuilder::RunBuilder(int argc, char** argv, std::unique_ptr<Builder<Data>>&& dataBuilder) : commandLineInput_{argc, argv}, dataBuilder_{std::move(dataBuilder)} {
// data_ and chainCollection_ are default constructed as null pointers
if (dataBuilder_ == nullptr) {
throw std::invalid_argument{
"Attempted to construct a RunBuilder with a null dataBuilder; "
"to construct a RunBuilder, pass in a non-null rvalue reference to a std::unique_ptr<Builder<Data>> containing the object with which the RunBuilder can build its Run's Data"};
}
// TODO: LALInferenceReadData does not make t-domain data when simulating data
}
std::unique_ptr<Run> RunBuilder::build() {
if (hasAlreadyBuilt_) {
throw std::logic_error("Called build() on a RunBuilder multiple times; build() may only be called once");
}
data_ = dataBuilder_->build();
...
It seems like it calls the dataBuilder_->build method. (I added print statements and nothing is getting printed from within dataBuilder_->build()) so it looks like it runBuilder is unable to call databuilder in the first place. However, the actual traceback has little to do with dataBuilder.
The traceback is this:
(ame) [sophie.hourihane@ldas-pcdev1 python_binding_tests]$ python test_cppyy_RunBuilder.py
.Set trigtime to 1168989748.0000000000
Using 0.400000 seconds of padding for IFO H1
Using 0.400000 seconds of padding for IFO L1
*** Break *** segmentation violation
Thread 8 (Thread 0x7f1d4a605700 (LWP 867381)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f1d52e06700 (LWP 867380)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f1d5b607700 (LWP 867379)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f1d63e08700 (LWP 867378)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f1d6c609700 (LWP 867377)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f1d74e0a700 (LWP 867376)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f1d7560b700 (LWP 867375)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f1d86081b80 (LWP 867351)):
#0 0x00007f1d84fef612 in waitpid () from /lib64/libc.so.6
#1 0x00007f1d84f51ce7 in do_system () from /lib64/libc.so.6
#2 0x00007f1d8484eb65 in CppyyLegacy::TUnixSystem::StackTrace() () from /home/sophie.hourihane/.conda/envs/ame/lib/python3.10/site-packages/cppyy_backend/lib/libCoreLegacy.so
#3 0x00007f1d7cb87e48 in (anonymous namespace)::TExceptionHandlerImp::HandleException(int) () from /home/sophie.hourihane/.conda/envs/ame/lib/python3.10/site-packages/cppyy_backend/lib/libcppyy_backend.so
#4 0x00007f1d8484d861 in CppyyLegacy::TUnixSystem::DispatchSignals(CppyyLegacy::ESignals) () from /home/sophie.hourihane/.conda/envs/ame/lib/python3.10/site-packages/cppyy_backend/lib/libCoreLegacy.so
#5 <signal handler called>
#6 0x00007f1d7a9877c5 in RunBuilder::build (this=0x557d26a22d30) at /home/sophie.hourihane/.conda/envs/ame/x86_64-conda-linux-gnu/include/c++/11.4.0/ext/unconditional_prior_distribution.ipp:421
#7 0x00007f1d38d82028 in ?? ()
#8 0x0000557d269dc1c0 in ?? ()
#9 0x00007fffb5b082c0 in ?? ()
#10 0x0000000000000000 in ?? ()
*** Break *** segmentation violation
Thread 8 (Thread 0x7f1d4a605700 (LWP 867381)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f1d52e06700 (LWP 867380)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f1d5b607700 (LWP 867379)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f1d63e08700 (LWP 867378)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f1d6c609700 (LWP 867377)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f1d74e0a700 (LWP 867376)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f1d7560b700 (LWP 867375)):
#0 0x00007f1d85c5b45c in pthread_cond_wait
GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f1d7749406c in blas_thread_server () from /home/sophie.hourihane/.conda/envs/ame/lib/././libcblas.so.3
#2 0x00007f1d85c551ca in start_thread () from /lib64/libpthread.so.0
#3 0x00007f1d84f2fe73 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f1d86081b80 (LWP 867351)):
#0 0x00007f1d84fef612 in waitpid () from /lib64/libc.so.6
#1 0x00007f1d84f51ce7 in do_system () from /lib64/libc.so.6
#2 0x00007f1d8484eb65 in CppyyLegacy::TUnixSystem::StackTrace() () from /home/sophie.hourihane/.conda/envs/ame/lib/python3.10/site-packages/cppyy_backend/lib/libCoreLegacy.so
#3 0x00007f1d7cb87cc5 in (anonymous namespace)::TExceptionHandlerImp::HandleException(int) () from /home/sophie.hourihane/.conda/envs/ame/lib/python3.10/site-packages/cppyy_backend/lib/libcppyy_backend.so
#4 0x00007f1d8484d861 in CppyyLegacy::TUnixSystem::DispatchSignals(CppyyLegacy::ESignals) () from /home/sophie.hourihane/.conda/envs/ame/lib/python3.10/site-packages/cppyy_backend/lib/libCoreLegacy.so
#5 <signal handler called>
#6 0x00007f1d7a9877c5 in RunBuilder::build (this=0x557d26a22d30) at /home/sophie.hourihane/.conda/envs/ame/x86_64-conda-linux-gnu/include/c++/11.4.0/ext/unconditional_prior_distribution.ipp:421
#7 0x00007f1d38d82028 in ?? ()
#8 0x0000557d269dc1c0 in ?? ()
#9 0x00007fffb5b082c0 in ?? ()
#10 0x0000000000000000 in ?? ()
Which is confusing for many reasons:
- I am explicitly using a single thread, why are there multiple threads in the traceback? (It fails with the same error when threading is turned on)
unconditional_prior_distribution.ippis not called by databuilder (it is called later by runBuilder)- This exact call works from c++ on the linux machine (and python and c++ on my mac)
If you have any pointers for what I am doing wrong that would be great. I am hoping I am maybe just treating std::make_unique incorrectly?
Thank you!
The multiple threads are started by BLAS, probably b/c of OpenMP, for which there should be a simple way of either switching that off, or setting the number of threads to 1.
As for the crash:
auto dataBuilder = std::make_unique<LalDataBuilder>(argc, argv);
RunBuilder runBuilder{argc, argv, std::move(dataBuilder)};
auto run = runBuilder.build();
but the Python code is:
dataBuilder = Cpp.LalDataBuilder(len(split_command_line), split_command_line)
self.runBuilder = Cpp.RunBuilder(len(split_command_line), split_command_line, Cpp.std.move(dataBuilder))
self.run = self.runBuilder.build()
which is missing that std.make_unique.
What that means is that the std.move is applied to the LalDataBuilder object in Python, but in C++ it's applied to the std::unique_ptr<LalDataBuilder> object. std::move is a cast that doesn't necessarily lead to a call of the move constructor, only if needed, but does LalDataBuilder have one and if yes, is it clearing state it shouldn't?