SPECT reconstruction causes error with CIL GradientOperator if OPENMP=ON
Segmentation error (core dumped)
After running minimalCIL.py @ https://github.com/samdporter/SPECTissues.git
CIL's admm causes failure. OSEM reconstruction works OK.
Ran valgrind & will post in next comment
==85870== 64,167,432 bytes in 126 blocks are still reachable in loss record 7,248 of 7,250 ==85870== at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==85870== by 0x30673740: stir::ProjMatrixByBinSPECTUB::set_up(std::shared_ptr<stir::ProjDataInfo const> const&, std::shared_ptr<stir::DiscretisedDensity<3, float> const> const&) (in /usr/devel/build/INSTALL/python/sirf/_pystir.so) ==85870== by 0x30682166: stir::ProjectorByBinPair::set_up(std::shared_ptr<stir::ProjDataInfo const> const&, std::shared_ptr<stir::DiscretisedDensity<3, float> const> const&) (in /usr/devel/build/INSTALL/python/sirf/_pystir.so) ==85870== by 0x306826BC: stir::ProjectorByBinPairUsingProjMatrixByBin::set_up(std::shared_ptr<stir::ProjDataInfo const> const&, std::shared_ptr<stir::DiscretisedDensity<3, float> const> const&) (in /usr/devel/build/INSTALL/python/sirf/_pystir.so) ==85870== by 0x3059DB54: sirf::PETAcquisitionModel::set_up(std::shared_ptrsirf::PETAcquisitionData, std::shared_ptrsirf::STIRImageData) (in /usr/devel/build/INSTALL/python/sirf/_pystir.so) ==85870== by 0x3056A3F6: sirf::PETAcquisitionModelUsingMatrix::set_up(std::shared_ptrsirf::PETAcquisitionData, std::shared_ptrsirf::STIRImageData) (in /usr/devel/build/INSTALL/python/sirf/_pystir.so) ==85870== by 0x30554F40: cSTIR_setupAcquisitionModel (in /usr/devel/build/INSTALL/python/sirf/_pystir.so) ==85870== by 0x3052DE33: _wrap_cSTIR_setupAcquisitionModel (in /usr/devel/build/INSTALL/python/sirf/_pystir.so) ==85870== by 0x5F3A29: PyCFunction_Call (in /usr/bin/python3.8) ==85870== by 0x5F3E1D: _PyObject_MakeTpCall (in /usr/bin/python3.8) ==85870== by 0x570673: _PyEval_EvalFrameDefault (in /usr/bin/python3.8) ==85870== by 0x5F6835: _PyFunction_Vectorcall (in /usr/bin/python3.8) ==85870== ==85870== 113,746,225 bytes in 1 blocks are still reachable in loss record 7,249 of 7,250 ==85870== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==85870== by 0x5F8A00: PyBytes_FromStringAndSize (in /usr/bin/python3.8) ==85870== by 0x5F903B: ??? (in /usr/bin/python3.8) ==85870== by 0x646F95: ??? (in /usr/bin/python3.8) ==85870== by 0x5C4675: ??? (in /usr/bin/python3.8) ==85870== by 0x4F290D: ??? (in /usr/bin/python3.8) ==85870== by 0x64F717: ??? (in /usr/bin/python3.8) ==85870== by 0x5048B2: ??? (in /usr/bin/python3.8) ==85870== by 0x56B1D9: _PyEval_EvalFrameDefault (in /usr/bin/python3.8) ==85870== by 0x569399: _PyEval_EvalCodeWithName (in /usr/bin/python3.8) ==85870== by 0x5F6A12: _PyFunction_Vectorcall (in /usr/bin/python3.8) ==85870== by 0x56B1D9: _PyEval_EvalFrameDefault (in /usr/bin/python3.8) ==85870== ==85870== 240,458,752 bytes in 2 blocks are still reachable in loss record 7,250 of 7,250 ==85870== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==85870== by 0x5806F33: default_malloc (in /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so) ==85870== by 0x580764E: PyDataMem_UserNEW (in /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so) ==85870== by 0x58644B9: PyArray_NewFromDescr_int (in /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so) ==85870== by 0x580ABB7: array_new (in /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so) ==85870== by 0x5F3D02: _PyObject_MakeTpCall (in /usr/bin/python3.8) ==85870== by 0x570AF8: _PyEval_EvalFrameDefault (in /usr/bin/python3.8) ==85870== by 0x569399: _PyEval_EvalCodeWithName (in /usr/bin/python3.8) ==85870== by 0x5F6A12: _PyFunction_Vectorcall (in /usr/bin/python3.8) ==85870== by 0x56C28B: _PyEval_EvalFrameDefault (in /usr/bin/python3.8) ==85870== by 0x5F6835: _PyFunction_Vectorcall (in /usr/bin/python3.8) ==85870== by 0x59DDFE: ??? (in /usr/bin/python3.8) ==85870== ==85870== LEAK SUMMARY: ==85870== definitely lost: 1,816 bytes in 15 blocks ==85870== indirectly lost: 0 bytes in 0 blocks ==85870== possibly lost: 699,830 bytes in 629 blocks ==85870== still reachable: 514,605,355 bytes in 80,204 blocks ==85870== of which reachable via heuristic: ==85870== stdstring : 25,582 bytes in 681 blocks ==85870== newarray : 248,504 bytes in 247 blocks ==85870== suppressed: 0 bytes in 0 blocks ==85870== ==85870== ERROR SUMMARY: 78258 errors from 554 contexts (suppressed: 0 from 0)
Possible memory leak?
results from gbd:
Thread 34 "python3" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fc781aad700 (LWP 261805)] __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x249) at /usr/include/c++/9/ext/atomicity.h:96 96 __atomic_add(__mem, __val);
#0 __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x219) at /usr/include/c++/9/ext/atomicity.h:96
#1 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x211) at /usr/include/c++/9/bits/shared_ptr_base.h:139
#2 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::operator= (__r=..., this=
$ (gdb) up
#1 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x211) at /usr/include/c++/9/bits/shared_ptr_base.h:139
139 { __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); }
$ (gdb) up
#2 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::operator= (__r=..., this=
hmmm. Potentially multi-threading stuff. could be a red-herring. Nevertheless, can you please compile STIR without openMP
cd ..../builds/STIR/build
cmake -DSTIR_OPENMP:BOOL=OFF .
make -j6 install
cd ../../SIRF/build
make -j6 install
your valgrind log is from a build without debug info, so we don't have any line info. Can you re-run? Probably best to attach as a log file though, not paste the actual content.
ideally re-run before building non-OEPNMP version.
This is the valgrind output with the debugger on (I've checked it's on in cmake, but it looks the same to me)
Looks like it is an issue with multi-threading. Everything runs fine with openMP off
Hi, The problem seems to arise after creating a CIL gradient operator with a C backend see: mimimal.ipynb The final cell fails
The algorithm runs fine when using Numpy. Should I raise this as a CIL issue?
good detective work! Pretty weird though. Does this then depend on SPECT at all? i.e. can you just save an image first, then load it in a new kernel and do the CIL operation?
tagging @epapoutsellis @gfardell @paskino
minimal.ipynb does not work. It runs ok with backend=numpy in the GradientOperator?
Yes exactly
GradientOperator with C backend expects contiguous data as in a numpy ndarray or a float* C array. I think it gets passed a SIRF object with which it doesn't have any clue what to do. So, I think this is expected. Use backend='numpy'.
The link you provide to minimal.ipynb returns a 404.
Sorry the repository was private. Try now
GradientOperatorwith C backend expects contiguous data as in a numpy ndarray or a float* C array. I think it gets passed a SIRF object with which it doesn't have any clue what to do. So, I think this is expected. Usebackend='numpy'.The link you provide to
minimal.ipynbreturns a 404.
Why would this be different for PET?
Just to understand, the last cell contains
### fails ###
tmp = am.forward(fdg)
where am is defined above as am = STIR.AcquisitionModelUsingMatrix(acq_model_matrix). As far as I can understand this has nothing to do with the cell above GradientOperator
grad = operators.GradientOperator(fdg, method='backward')
where you probably have also unveiled a bug as the method is not saved into the instance and therefore not used. IMHO it is should even fail .
grad = operators.GradientOperator(fdg, method='backward') This actually runs because method = 'backward' uses the Numpy backend.
I've update the notebook with a c-backend grad operator that does fail (apologies)
I agree that it has nothing to do with the am operator. It's very confusing. In the notebook I was trying to show all the things that you can do with no problems and then the final two cells cause a problem (if that makes sense).
The last cell runs with a GradientOperator numpy backend in the cell above and not with a GradientOperator C backend
no, the method is passed over via the kwargs.
What's the situation with this?
Do we have a SPECT problem, a CIL problem, or we don't know?
@samdporter, did you set keep_all_views_in_cache? If not, can you set it to true? This should enable multi-threading for SPECT (and avoid the horrible/dangerous work-around in https://github.com/UCL/STIR/blob/25b65645f9b532ed5129333c096a25faab8ae07a/src/recon_buildblock/ProjMatrixByBinSPECTUB.cxx#L244-L248). Note that this work-around shouldn't do anything if we set_num_threads(1) , at least for STIR. I have no idea that what omp_set_num_threads does when called in STIR, but openmp is used in CIL. @gfardell any idea?
Latest update from @samdporter on discord is that PET_SPECT_Synergestic works provided the CIL GradientOperator uses the numpy backend.
Hi, update: Setting up and running two SPECT reconstructions as in https://github.com/samdporter/SPECTissues/blob/main/OSEM.py results in error:
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
Aborted (core dumped)
I'm having trouble building SIRF with Debug info but will comment when I am able to. I believe that I have all the relevant updates to SPECTUBMatrix on my local build
we need the text of the exception. Presumably it was redirected to a file?