server icon indicating copy to clipboard operation
server copied to clipboard

python backend crash

Open Tsingjie89 opened this issue 2 years ago • 3 comments

Description python backend may crash on multi instance on cpu mode.

Triton Information What version of Triton are you using? 22.04

Are you using the Triton container or did you build it yourself? use Triton container

To Reproduce recoginze model cfg: name: "rec_ch_cpu" backend: "paddle" max_batch_size: 6

input [ { name:"x", data_type:TYPE_FP32, dims:[3, 48, -1] } ] output [ { name:"softmax_5.tmp_0", data_type:TYPE_FP32, dims:[-1, 6625] } ]

instance_group [ { count: 4 kind: KIND_CPU } ]

optimization { execution_accelerators { cpu_execution_accelerator : [ { name : "mkldnn" parameters { key: "cpu_threads" value: "5" } } ] } }

python backend cfg: name: "ocr_lite_rec" backend: "python"

input [ { name: "INPUT_0" data_type: TYPE_STRING dims: [-1] } ] input [ { name: "INPUT_1" data_type: TYPE_STRING dims: [-1] } ]

output [ { name: "OUTPUT" data_type: TYPE_STRING dims: [-1] } ]

instance_group [{ count: 2 kind: KIND_CPU } ]

test data: 84 images, 50 bboxes per image

Expected behavior coredump info: Core was generated by `/opt/tritonserver/backends/python/triton_python_backend_stub /workspace/ocr_lit'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00005594dc325f49 in boost::intrusive::bstree_algorithms_base<boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, t rue> >::next_node(boost::interprocess::offset_ptr<boost::intrusive::compact_rbtree_node<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul> >, long, unsigned long, 0ul> const&) () [Current thread is 1 (Thread 0x7f98c1cfc000 (LWP 125682))] (gdb) (gdb) bt #0 0x00005594dc325f49 in boost::intrusive::bstree_algorithms_base<boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, t rue> >::next_node(boost::interprocess::offset_ptr<boost::intrusive::compact_rbtree_node<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul> >, long, unsigned long, 0ul> const&) () #1 0x00005594dc32e8b5 in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0u l>::priv_deallocate(void*) () #2 0x00005594dc32eea4 in std::_Function_handler<void (char*), triton::backend::python::AllocatedSharedMemory triton::backend::python::SharedMemoryManager::WrapObje ctInUniquePtr(char*, triton::backend::python::AllocatedShmOwnership*, long const&)::{lambda(char*)#1}>::_M_invoke(std::_Any_data const&, char*&&) () #3 0x00005594dc33f2bd in triton::backend::python::PbTensor::~PbTensor() () #4 0x00005594dc343261 in std::_Sp_counted_ptr_inplace<triton::backend::python::PbTensor, std::allocatortriton::backend::python::PbTensor, (__gnu_cxx::_Lock_policy)2>:$_M_dispose() () #5 0x00005594dc319e55 in triton::backend::python::InferRequest::~InferRequest() () #6 0x00005594dc319f76 in std::_Sp_counted_ptr<triton::backend::python::InferRequest*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () #7 0x00005594dc31b598 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::M_release() () #8 0x00005594dc31b71a in pybind11::class<triton::backend::python::InferRequest, std::shared_ptrtriton::backend::python::InferRequest >::dealloc(pybind11::detail::value_and_holder&) () #9 0x00005594dc309c27 in pybind11::detail::clear_instance(_object*) () #10 0x00005594dc30aba3 in pybind11_object_dealloc () #11 0x00007f98c2334dd3 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #12 0x00007f98c254c865 in _PyGen_Send () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #13 0x00007f98b8839ef9 in ?? () from /usr/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so #14 0x00007f98b88390ac in ?? () from /usr/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so #15 0x00007f98c255db1b in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #16 0x00007f98c246a8a3 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #17 0x00007f98c251444f in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #18 0x00007f98c255d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #19 0x00007f98c2332f48 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #20 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #21 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #22 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #23 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #24 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #25 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #26 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #27 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #28 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #29 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #30 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #31 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #32 0x00007f98c247fe3b in _PyEval_EvalCodeWithName () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #33 0x00007f98c255d114 in _PyFunction_Vectorcall () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #34 0x00007f98c255d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #35 0x00005594dc31f565 in pybind11::object pybind11::detail::object_api<pybind11::detail::accessorpybind11::detail::accessor_policies::str_attr >::operator()<(pybind11::return_value_policy)1, pybind11::object&>(pybind11::object&) const () #36 0x00005594dc313c95 in triton::backend::python::Stub::Execute(triton::backend::python::RequestBatch*, triton::backend::python::ResponseBatch*, long*) () #37 0x00005594dc317bf6 in triton::backend::python::Stub::RunCommand() () #38 0x00005594dc2fd160 in main ()

Tsingjie89 avatar Sep 08 '22 12:09 Tsingjie89

Hi @Tsingjie89, thanks for sharing the config and back trace. Are you able to reproduce this with our newest container? Could you also share the model you are using and the steps to reproduce the issue that will really help us investigate this further?

krishung5 avatar Sep 08 '22 23:09 krishung5

Hi @krishung5 use container: nvcr.io/nvidia/tritonserver:22.08-py3 same model cfg coredump info: Core was generated by `/opt/tritonserver/backends/python/triton_python_backend_stub /workspace/ocr_lit'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000557252ba953e in boost::intrusive::multiset_impl<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::i nterprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, void, void, unsigned long, true, void>::insert(boost::intrusive::tree_iterator<boost::$ ntrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::bl$ ck_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusi$ e::dft_tag, 3u>, true>, boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul$ ::block_ctrl&) () [Current thread is 1 (Thread 0x7f1ca37fe000 (LWP 24676))] (gdb) (gdb) (gdb) bt #0 0x0000557252ba953e in boost::intrusive::multiset_impl<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::$nterprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long$ 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, void, void, unsigned long, true, void>::insert(boost::intrusive::tree_iterator<boost::$ntrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::bl$ck_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusi$e::dft_tag, 3u>, true>, boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul$::block_ctrl&) () #1 0x0000557252baa44a in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0$l>::priv_deallocate(void*) () #2 0x0000557252baa934 in std::_Function_handler<void (char*), triton::backend::python::SharedMemoryManager::WrapObjectInUniquePtr(char*, triton::backend::python::$llocatedShmOwnership*, long const&)::{lambda(char*)#1}>::_M_invoke(std::_Any_data const&, char*&&) () #3 0x0000557252bddbdd in triton::backend::python::PbTensor::~PbTensor() () #4 0x0000557252be16d1 in std::_Sp_counted_ptr_inplace<triton::backend::python::PbTensor, std::allocatortriton::backend::python::PbTensor, (__gnu_cxx::Lock_policy)2>:$M_dispose() () #5 0x0000557252bbe7ed in triton::backend::python::InferRequest::~InferRequest() () #6 0x0000557252bae750 in pybind11::cpp_function::initialize<triton::backend::python::pybind11_init_c_python_backend_utils(pybind11::module&)::{lambda(std::shared_ptrtriton::backend::python::InferRequest&)#2}::operator()(std::shared_ptrtriton::backend::python::InferRequest&) const::{lambda()#1}, std::shared_ptrtriton::backend::python::InferResponse>(triton::backend::python::pybind11_init_c_python_backend_utils(pybind11::module&)::{lambda(std::shared_ptrtriton::backend::python::InferRequest&)#2}::operator()(std::shared_ptrtriton::backend::python::InferRequest&) const::{lambda()#1}&&, std::shared_ptrtriton::backend::python::InferResponse ()())::{lambda(pybind11::detail::function_record)#1}::_FUN(pybind11::detail) () #7 0x0000557252b93985 in pybind11::cpp_function::initialize_generic(std::unique_ptr<pybind11::detail::function_record, pybind11::cpp_function::InitializingFunctionRecordDeleter>&&, char const*, std::type_info const* const*, unsigned long)::{lambda(void*)#1}::_FUN(void*) () #8 0x00007f1f7bb0abb3 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #9 0x00007f1f7bac6245 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #10 0x00007f1f7bad5774 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #11 0x00007f1f7baadfbf in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #12 0x00007f1f7b8e1cb0 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #13 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #14 0x00007f1f7bb0d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #15 0x00007f1f7b8dfa7a in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #16 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #17 0x00007f1f7b8d9d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #18 0x00007f1f7b8db018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #19 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #20 0x00007f1f7b8d9d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #21 0x00007f1f7b8db018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #22 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #23 0x00007f1f7bb0de2b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #24 0x00007f1f7bb0d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #25 0x00007f1f7b97bc01 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 #26 0x00007f1f7b9e251b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0 --Type <RET> for more, q to quit, c to continue without paging-- #27 0x00007f1f7b64b609 in start_thread (arg=) at pthread_create.c:477 #28 0x00007f1f7b570133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Tsingjie89 avatar Sep 13 '22 10:09 Tsingjie89

Thank you @Tsingjie89 for the reply. Apart from the model config you shared, we would also need the model file, model.py in this case, to reproduce the issue. Besides, could you provide the steps to reproduce the issue? i.e. the full tritonserver ... command you run to get the coredump.

krishung5 avatar Sep 13 '22 19:09 krishung5

Closing due to inactivity. Please let us know to reopen the issue if you'd like to follow up.

dyastremsky avatar Sep 30 '22 22:09 dyastremsky