lpo icon indicating copy to clipboard operation
lpo copied to clipboard

Segmentation Fault on OS X

Open peterswang opened this issue 9 years ago • 4 comments

On 10.10.3, with Python 2.7.10 and boost and boost-python 1.58.0. Built using:

cmake .. -DCMAKE_BUILD_TYPE=Release -DDATA_DIR=~/test_images/coco-master/images/val2014 -DUSE_PYTHON=2 make -j9

Saw some warnings only, such as: In file included from /Users/peterwang/CPP_Resources/lpo-release/lib/crf/crf.cpp:31: /Users/peterwang/CPP_Resources/lpo-release/external/ibfs/ibfs.h:161:2: warning: 'Node' defined as a class here but previously declared as a struct [-Wmismatched-tags] class Node ^ /Users/peterwang/CPP_Resources/lpo-release/external/ibfs/ibfs.h:150:2: note: did you mean class here? struct Node; ^~~~~~ class

Tried:

python train_lpo.py -f0 0.2 ../models/lpo_VOC_0.2.dat and got: Segmentation fault: 11 This appears to have crashed on "from python.lpo import *" in lpo.py.

Crash report contained: ... Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 ??? 000000000000000000 0 + 0 1 org.python.python 0x0000000103d150dd PyEval_GetGlobals + 23 2 org.python.python 0x0000000103d2462b PyImport_Import + 137 3 org.python.python 0x0000000103d22d27 PyImport_ImportModule + 31 4 lpo.so 0x00000001033f45a3 init_numpy() + 19 5 lpo.so 0x00000001033f4779 defineUtil() + 25 6 lpo.so 0x00000001033f4499 init_module_lpo() + 9 7 libboost_python-mt.dylib 0x0000000103c36391 boost::python::handle_exception_impl(boost::function0) + 81 8 libboost_python-mt.dylib 0x0000000103c373b9 boost::python::detail::init_module(char const_, void (_)()) + 121 9 org.python.python 0x0000000101836327 _PyImport_LoadDynamicModule + 140 ...

Saw the note in external/boost/readme.txt: "In order to use a non-system boost library copy the "boost" and "libs" directory of a recent boost release (eg 1.57) here."

And in build/lib/python/CMakeFiles/lpo.dir/depend.make: ... lib/python/CMakeFiles/lpo.dir/boost.cpp.o: /usr/local/include/boost/array.hpp lib/python/CMakeFiles/lpo.dir/boost.cpp.o: /usr/local/include/boost/assert.hpp lib/python/CMakeFiles/lpo.dir/boost.cpp.o: /usr/local/include/boost/bind.hpp ...

These seem to suggest the seg fault was due to boost version mismatch?

Is it sufficient to just do:

ln -s /usr/local/Cellar/boost/1.58.0 external/boost/ ln -s /usr/local/Cellar/boost/1.58.0/lib external/boost/libs

Or something else?

BTW, boost and boost-python were installed as part of setting up Caffe. The Caffe ImageNet model ran successfully when invoked from a Python test app.

Thanks for any light you could help shed.

peterswang avatar Aug 04 '15 23:08 peterswang

I don't think that the struct / class thing causes the segfault. Can you try to build it using cmake -DCMAKE_BUILD_TYPE=Debug, and then run either gdb or lldm on it?

philkr avatar Aug 07 '15 04:08 philkr

Rebuilt with cmake -DCMAKE_BUILD_TYPE=Debug and run under lldb:

lldb -- python train_lpo.py -f0 0.2 ../models/lpo_VOC_0.2.dat (lldb) target create "python" Current executable set to 'python' (x86_64). (lldb) settings set -- target.run-args "train_lpo.py" "-f0" "0.2" "../models/lpo_VOC_0.2.dat" (lldb) breakpoint set -f boost.cpp -l 27 Breakpoint 1: no locations (pending). WARNING: Unable to resolve breakpoint to any actual locations. (lldb) r Process 91895 launched: '/usr/local/bin/python' (x86_64) Process 91895 stopped

  • thread #1: tid = 0x67a797, 0x00007fff5fc01000 dyld_dyld_start, stop reason = exec frame #0: 0x00007fff5fc01000 dyld_dyld_start dyld`_dyld_start: -> 0x7fff5fc01000 <+0>: popq %rdi 0x7fff5fc01001 <+1>: pushq $0x0 0x7fff5fc01003 <+3>: movq %rsp, %rbp 0x7fff5fc01006 <+6>: andq $-0x10, %rsp (lldb) br list Current breakpoints: 1: file = 'boost.cpp', line = 27, locations = 0 (pending)

(lldb) br set -f contour.cpp -l 27 Breakpoint 2: no locations (pending). WARNING: Unable to resolve breakpoint to any actual locations. (lldb) br set -n BOOST_PYTHON_MODULE Breakpoint 3: no locations (pending). WARNING: Unable to resolve breakpoint to any actual locations. (lldb) c Process 91895 resuming 1 location added to breakpoint 2 Process 91895 stopped

  • thread #1: tid = 0x67a797, 0x0000000000000000, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x0000000000000000 error: memory read failed for 0x0

I've not debugged Python wrapper using lldb, nor gdb, before. Trying to set breakpoints directly in C++ files while running a Python wrapper didn't work.

Could you advise on how to track this down in lldb?

peterswang avatar Aug 09 '15 07:08 peterswang

I think it added the breakpoints once lpo was loaded in python. See 1 location added to breakpoint 2.I think you're almost there, what is the backtrace for the EXC_BAD_ACCESS?

philkr avatar Aug 09 '15 14:08 philkr

Thanks for pointing lldb msg. Here's the debug log & backtrace:

$ lldb -- python train_lpo.py -f0 0.2 ../models/lpo_VOC_0.2.dat (lldb) target create "python" Current executable set to 'python' (x86_64). (lldb) settings set -- target.run-args "train_lpo.py" "-f0" "0.2" "../models/lpo_VOC_0.2.dat" (lldb) br set -f lpo.cpp -l 38 Breakpoint 1: no locations (pending). WARNING: Unable to resolve breakpoint to any actual locations. (lldb) r Process 7734 launched: '/usr/local/bin/python' (x86_64) Process 7734 stopped

  • thread #1: tid = 0x2ab3c, 0x00007fff5fc01000 dyld_dyld_start, stop reason = exec frame #0: 0x00007fff5fc01000 dyld_dyld_start dyld`_dyld_start: -> 0x7fff5fc01000 <+0>: popq %rdi 0x7fff5fc01001 <+1>: pushq $0x0 0x7fff5fc01003 <+3>: movq %rsp, %rbp 0x7fff5fc01006 <+6>: andq $-0x10, %rsp (lldb) c Process 7734 resuming 3 locations added to breakpoint 1 Process 7734 stopped
  • thread #1: tid = 0x2ab3c, 0x0000000103165a14 lpo.soinit_module_lpo() + 4 at lpo.cpp:38, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000103165a14 lpo.soinit_module_lpo() + 4 at lpo.cpp:38 35
    36 BOOST_PYTHON_MODULE(lpo) { 37 /************ Util **_/ -> 38 defineUtil(); 39 #ifdef USE_DATASET 40 /_* Dataset *************/ 41 defineDataset(); (lldb) s Process 7734 stopped
  • thread #1: tid = 0x2ab3c, 0x0000000103166524 lpo.sodefineUtil() + 4 at util.cpp:258, queue = 'com.apple.main-thread', stop reason = step in frame #0: 0x0000000103166524 lpo.sodefineUtil() + 4 at util.cpp:258 255 BOOST_PYTHON_FUNCTION_OVERLOADS(rasterize2,rasterize,1,2) 256 void defineUtil() { 257 // NOTE: This file has a ton of macros and templates, so it's going to take a while to compile ... -> 258 init_numpy(); 259 boost::python::numeric::array::set_module_and_type("numpy", "ndarray"); 260
    261 register_exception_translator<AssertException>(&translateAssertException); (lldb) s Process 7734 stopped
  • thread #1: tid = 0x2ab3c, 0x00000001031661e4 lpo.soinit_numpy() + 4 at util.cpp:250, queue = 'com.apple.main-thread', stop reason = step in frame #0: 0x00000001031661e4 lpo.soinit_numpy() + 4 at util.cpp:250 247 } 248 #else 249 void init_numpy() { -> 250 import_array(); 251 } 252 #endif 253
    (lldb) s Process 7734 stopped
  • thread #1: tid = 0x2ab3c, 0x000000010316622f lpo.so_import_array() + 15 at __multiarray_api.h:1632, queue = 'com.apple.main-thread', stop reason = step in frame #0: 0x000000010316622f lpo.so_import_array() + 15 at __multiarray_api.h:1632 1629 _import_array(void) 1630 { 1631 int st; -> 1632 PyObject *numpy = PyImport_ImportModule("numpy.core.multiarray"); 1633 PyObject *c_api = NULL; 1634 1635 if (numpy == NULL) { (lldb) s Process 7734 stopped
  • thread #1: tid = 0x2ab3c, 0x0000000000000000, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x0000000000000000 error: memory read failed for 0x0 (lldb) bt
  • thread #1: tid = 0x1d5d9, 0x0000000000000000, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    • frame #0: 0x0000000000000000 frame #1: 0x00000001033780dd PythonPyEval_GetGlobals + 23 frame #2: 0x000000010338762b PythonPyImport_Import + 137 frame #3: 0x0000000103385d27 PythonPyImport_ImportModule + 31 frame #4: 0x0000000104003234 lpo.so_import_array() + 20 at __multiarray_api.h:1632 frame #5: 0x00000001040031e9 lpo.soinit_numpy() + 9 at util.cpp:250 frame #6: 0x0000000104003530 lpo.sodefineUtil() + 16 at util.cpp:258 frame #7: 0x0000000104002a19 lpo.soinit_module_lpo() + 9 at lpo.cpp:38 frame #8: 0x0000000103299391 libboost_python-mt.dylibboost::python::handle_exception_impl(boost::function0) + 81 frame #9: 0x000000010329a3b9 libboost_python-mt.dylibboost::python::detail::init_module(char const*, void (*)()) + 121 frame #10: 0x00000001040029fb lpo.soinitlpo + 27 at lpo.cpp:36 ....

So, it crashed when executing: PyObject *numpy = PyImport_ImportModule("numpy.core.multiarray");

The Python 2.7.10 reference shows: PyObject* PyImport_ImportModule(const char *name)

Does this mean there's a bug in the numpy/core/include/numpy/__multiarray_api.h?

peterswang avatar Aug 11 '15 07:08 peterswang