Segfault while using ColPack coloring method
Hello,
I'm experiencing random segfaults when using the ColPack coloring method to compute a sparse jacobian. I can reproduce them with sparse_exemple. To make them occur more frequently I've changed the line of sparse.cpp:
Run( colpack_jacobian, "colpack_jacobian" );
with
for (int i = 0; i < 10000; i++) {
Run( colpack_jacobian, "colpack_jacobian" );
}
This problem might be related to this issue: coin-or/Adol-C/19, and in that case the problem could come from ColPack. I'm not completely convinced it's the same issue, because the bug in Adol-C only appears with column coloring and not with row coloring, and CppAD uses row coloring.
Additional information
OS: Debian 10 CppAD version: 20200000.2
Sorry for the slow response. I must have missed the e-mail informing me of this issue.
I have tried to reproduce this error (on a Fedora 33 system) and cannot. I think it may be an issue of the version of ColPack that is linked to CppAD in the Debian release. Here is what works for me and should work for you on your system:
- I built a local copy of cppad as follows:
clone https://github.com/coin-or/CppAD.git cppad.git
cd cppad.git
git checkout 20200000.2
bin/get_colpack.sh
cd build
libdir=$(find prefix -name 'libColPack.*' | head -1 | sed -e 's|prefix/\([^/]*\)/.*|\1|')
cmake -D colpack_prefix=$(pwd)/prefix -D cmake_install_libdirs="$libdir" ..
make check_example_sparse
All the tests passed (for me). I then edited the file ../example/sparse/sparse.cpp as follows:
- below
// This line is used by test_one.shI added the following text:
for (int i = 0; i < 10000; i++) {
Run( colpack_jacobian, "colpack_jacobian" );
}
# if 0
- above
// check for memory leakI added the following text
# endif
I then re-ran the following command (n the build directory)
make check_example_sparse
This time I got 1000 lines with
colpack_jacobian OK
- I then ran the command
valgrind --leak-check=yes example/sparse/example_sparse
And got the following message at the end:
==1563847== HEAP SUMMARY:
==1563847== in use at exit: 0 bytes in 0 blocks
==1563847== total heap usage: 1,590,060 allocs, 1,590,060 frees, 683,153,528 bytes allocated
==1563847==
==1563847== All heap blocks were freed -- no leaks are possible
==1563847==
==1563847== For lists of detected and suppressed errors, rerun with: -s
==1563847== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I've followed the exact same steps as you, and I still have some segfaults when I change the code to launch the test 10000 times. With the valgrind command I got the following message :
==24200== HEAP SUMMARY:
==24200== in use at exit: 5,472 bytes in 11 blocks
==24200== total heap usage: 1,790,071 allocs, 1,790,060 frees, 685,990,776 bytes allocated
==24200==
==24200== 2,128 bytes in 7 blocks are possibly lost in loss record 4 of 5
==24200== at 0x4837B65: calloc (vg_replace_malloc.c:752)
==24200== by 0x40116D1: allocate_dtv (dl-tls.c:286)
==24200== by 0x401203D: _dl_allocate_tls (dl-tls.c:532)
==24200== by 0x5072B95: allocate_stack (allocatestack.c:621)
==24200== by 0x5072B95: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==24200== by 0x504AD61: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24200== by 0x5041E09: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24200== by 0x48B3A29: ColPack::BipartiteGraphPartialColoring::PartialDistanceTwoRowColoring_OMP() (in /usr/lib/x86_64-linux-gnu/libColPack.so.0.0.0)
==24200== by 0x48B48A7: ColPack::BipartiteGraphPartialColoringInterface::PartialDistanceTwoColoring(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /usr/lib/x86_64-linux-gnu/libColPack.so.0.0.0)
==24200== by 0x4855BA3: CppAD::local::cppad_colpack_general(CppAD::vector<unsigned long>&, unsigned long, unsigned long, CppAD::vector<unsigned int*> const&) (cppad_colpack.cpp:73)
==24200== by 0x15A140: void CppAD::local::color_general_colpack<CppAD::local::sparse::list_setvec, CppAD::vector<unsigned long> >(CppAD::local::sparse::list_setvec const&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long>&) (color_general.hpp:268)
==24200== by 0x190811: unsigned long CppAD::ADFun<double, double>::SparseJacobianFor<CppAD::vector<double>, CppAD::local::sparse::list_setvec, CppAD::vector<unsigned long> >(CppAD::vector<double> const&, CppAD::local::sparse::list_setvec&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long> const&, CppAD::vector<double>&, CppAD::sparse_jacobian_work&) (sparse_jacobian.hpp:415)
==24200== by 0x190120: unsigned long CppAD::ADFun<double, double>::SparseJacobianForward<CppAD::vector<double>, std::vector<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >, std::allocator<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> > > >, CppAD::vector<unsigned long> >(CppAD::vector<double> const&, std::vector<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >, std::allocator<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> > > > const&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long> const&, CppAD::vector<double>&, CppAD::sparse_jacobian_work&) (sparse_jacobian.hpp:784)
==24200==
==24200== LEAK SUMMARY:
==24200== definitely lost: 0 bytes in 0 blocks
==24200== indirectly lost: 0 bytes in 0 blocks
==24200== possibly lost: 2,128 bytes in 7 blocks
==24200== still reachable: 3,344 bytes in 4 blocks
==24200== suppressed: 0 bytes in 0 blocks
==24200== Reachable blocks (those to which a pointer was found) are not shown.
==24200== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==24200==
==24200== For counts of detected and suppressed errors, rerun with: -v
==24200== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
(there was no segfaults when running with Valgrind, I got OK for all the tests)
Anyway, I was using ColPack because my benchmarks had shown better performances by using it, but it was probably just because of a different compilation flag between ColPack and CppAD. So now I'm using the default coloring method of CppAD, and everything works perfectly.
@Saelyos Do you get the error if you use the master branch ?
Yes, I've followed the same steps with master, and I still get the error.
I changed my loop to execute 10,000 times:
cppad.git>git diff
diff --git a/example/sparse/sparse.cpp b/example/sparse/sparse.cpp
index e9bf09f5b..cddacaa29 100644
--- a/example/sparse/sparse.cpp
+++ b/example/sparse/sparse.cpp
@@ -64,6 +64,10 @@ int main(void)
CppAD::test_boolofvoid Run(group, width);
// This line is used by test_one.sh
+ for (int i = 0; i < 100000; i++) {
+ Run( colpack_jacobian, "colpack_jacobian" );
+ }
+# if 0
// BEGIN_SORT_THIS_LINE_PLUS_2
// external compiled tests
@@ -102,6 +106,7 @@ int main(void)
Run( sparse2eigen, "sparse2eigen" );
# endif
//
+# endif
// check for memory leak
bool memory_ok = CppAD::thread_alloc::free_all();
// print summary at end
cppad.git>
Then in the build/example/sparse directory I executed the command:
sparse>valgrind ./example_sparse > junk
==31944== Memcheck, a memory error detector
==31944== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31944== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==31944== Command: ./example_sparse
==31944==
==31944==
==31944== HEAP SUMMARY:
==31944== in use at exit: 0 bytes in 0 blocks
==31944== total heap usage: 15,900,125 allocs, 15,900,125 frees, 6,703,210,377 bytes allocated
==31944==
==31944== All heap blocks were freed -- no leaks are possible
==31944==
==31944== For lists of detected and suppressed errors, rerun with: -s
==31944== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Perhaps you could figure out how to change the file bin/get_colpack.sh so that it builds a debug version of the Colpack library and then run the program in the debugger. That might give us some more information.
@Saelyos
When I used adolc, I encountered the same problem as you. Have you solved this problem? Or will the same problem not occur just by using CPPAD instead?
I haven't solved this problem and I haven't had the time to investigate on why it fails when I use ColPack with CppAD or Adol-C. Fortunately, using CppAD without ColPack works perfectly for me.
I haven't solved this problem and I haven't had the time to investigate on why it fails when I use ColPack with CppAD or Adol-C. Fortunately, using CppAD without ColPack works perfectly for me.
thank you so much