Cytnx
Cytnx copied to clipboard
segmentation fault for cytnx::linalg::_Lanczos_Gnd_general_Ut
I got a segmentation fault when running DMRG. This does not always happen but did happen from time to time, so I guess it is related to the convergence issue. The following is the warning and the error message, and the information from lldb to locate where the error happened.
[WARNING] Lanczos_Gnd -> Tridiag error:
Lanczos continues automatically.
# Cytnx warning occur at void cytnx::linalg::_Lanczos_Gnd_general_Ut(std::vector<cytnx::UniTensor>&, cytnx::LinOp*, const cytnx::UniTensor&, const bool&, const double&, const unsigned int&, const bool&)
# warning: [WARNING] iteration not converge after Maxiter!.
:: Note :: ignore if this is intended
# file : /home/chiamin/cytnx_dev/Cytnx/src/linalg/Lanczos_Gnd_Ut.cpp (169)
Process 615333 stopped
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
frame #0: 0x00007ffff58eae9b cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol15981 + 107
cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol15981:
-> 0x7ffff58eae9b <+107>: movq (%rbx), %rax
0x7ffff58eae9e <+110>: testq %rax, %rax
0x7ffff58eaea1 <+113>: je 0x7ffff58eb16e ; <+830>
0x7ffff58eaea7 <+119>: lock
(lldb) up
frame #1: 0x00007ffff5bf625c cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::_Lanczos_Gnd_general_Ut(std::vector<cytnx::UniTensor, std::allocator<cytnx::UniTensor> >&, cytnx::LinOp*, cytnx::UniTensor const&, bool const&, double const&, unsigned int const&, bool const&) + 10684
cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::_Lanczos_Gnd_general_Ut:
-> 0x7ffff5bf625c <+10684>: lock
0x7ffff5bf625d <+10685>: addq $0x1, 0x143dfc3(%rip)
0x7ffff5bf6265 <+10693>: lock
0x7ffff5bf6266 <+10694>: addq $0x1, 0x143ca12(%rip)
(lldb)
frame #2: 0x00007ffff5bf81f3 cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::Lanczos_Gnd_Ut(cytnx::LinOp*, cytnx::UniTensor const&, double const&, bool const&, bool const&, unsigned int const&) + 963
cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::Lanczos_Gnd_Ut:
-> 0x7ffff5bf81f3 <+963>: movq -0x50(%rbp), %rdi
0x7ffff5bf81f7 <+967>: popq %rax
0x7ffff5bf81f8 <+968>: popq %rdx
0x7ffff5bf81f9 <+969>: testq %rdi, %rdi
frame #3: 0x00007ffff5be8e4f cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::Lanczos(cytnx::LinOp*, cytnx::UniTensor const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double const&, unsigned int const&, unsigned long const&, bool const&, bool const&, unsigned int const&, bool const&) + 271
cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::Lanczos:
-> 0x7ffff5be8e4f <+271>: lock
0x7ffff5be8e50 <+272>: addq $0x1, 0x1439ad8(%rip)
0x7ffff5be8e58 <+280>: popq %rax
0x7ffff5be8e59 <+281>: popq %rdx
(lldb)
frame #4: 0x00007ffff598fe3c cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol18013 + 1740
cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol18013:
-> 0x7ffff598fe3c <+1740>: movq -0xd0(%rbp), %rdi
0x7ffff598fe43 <+1747>: addq $0x30, %rsp
0x7ffff598fe47 <+1751>: cmpq -0x118(%rbp), %rdi
0x7ffff598fe4e <+1758>: je 0x7ffff598ff20 ; <+1968>
(lldb)
frame #5: 0x00007ffff5848556 cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol15056 + 7670
cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol15056:
-> 0x7ffff5848556 <+7670>: movq %rax, %r12
0x7ffff5848559 <+7673>: lock
0x7ffff584855a <+7674>: addq $0x1, 0x15eb506(%rip) ; __bss_start + 27591
0x7ffff5848562 <+7682>: movq %r14, %rdi
(lldb)
frame #6: 0x00000000004fca37 python3`cfunction_call at methodobject.c:543:19
(lldb)
frame #7: 0x00000000004f64bb python3`_PyObject_MakeTpCall at call.c:215:18
(lldb)
frame #8: 0x00000000004f29f7 python3`_PyEval_EvalFrameDefault at abstract.h:112:16
(lldb)
frame #9: 0x00000000004fce7f python3`_PyFunction_Vectorcall at pycore_ceval.h:46:12
(lldb)
frame #10: 0x00000000004f1e5d python3`_PyEval_EvalFrameDefault at abstract.h:114:11
(lldb)
frame #11: 0x0000000000592912 python3`_PyEval_Vector at pycore_ceval.h:46:12
frame #3: 0x00007ffff5be8e4f cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::Lanczos(cytnx::LinOp*, cytnx::UniTensor const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double const&, unsigned int const&, unsigned long const&, bool const&, bool const&, unsigned int const&, bool const&) + 271
cytnx.cpython-310-x86_64-linux-gnu.so`cytnx::linalg::Lanczos:
-> 0x7ffff5be8e4f <+271>: lock
0x7ffff5be8e50 <+272>: addq $0x1, 0x1439ad8(%rip)
0x7ffff5be8e58 <+280>: popq %rax
0x7ffff5be8e59 <+281>: popq %rdx
(lldb)
frame #4: 0x00007ffff598fe3c cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol18013 + 1740
cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol18013:
-> 0x7ffff598fe3c <+1740>: movq -0xd0(%rbp), %rdi
0x7ffff598fe43 <+1747>: addq $0x30, %rsp
0x7ffff598fe47 <+1751>: cmpq -0x118(%rbp), %rdi
0x7ffff598fe4e <+1758>: je 0x7ffff598ff20 ; <+1968>
(lldb)
frame #5: 0x00007ffff5848556 cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol15056 + 7670
cytnx.cpython-310-x86_64-linux-gnu.so`___lldb_unnamed_symbol15056:
-> 0x7ffff5848556 <+7670>: movq %rax, %r12
0x7ffff5848559 <+7673>: lock
0x7ffff584855a <+7674>: addq $0x1, 0x15eb506(%rip) ; __bss_start + 27591
0x7ffff5848562 <+7682>: movq %r14, %rdi
(lldb)
frame #6: 0x00000000004fca37 python3`cfunction_call at methodobject.c:543:19
(lldb)
frame #7: 0x00000000004f64bb python3`_PyObject_MakeTpCall at call.c:215:18
(lldb)
frame #8: 0x00000000004f29f7 python3`_PyEval_EvalFrameDefault at abstract.h:112:16
(lldb)
frame #9: 0x00000000004fce7f python3`_PyFunction_Vectorcall at pycore_ceval.h:46:12
(lldb)
frame #10: 0x00000000004f1e5d python3`_PyEval_EvalFrameDefault at abstract.h:114:11
(lldb)
frame #11: 0x0000000000592912 python3`_PyEval_Vector at pycore_ceval.h:46:12
(lldb)
This is somehow related to the convergence in Lanczos. If I change a better initial state usually the segfault is gone.