server
server copied to clipboard
triton-inference-server cannot be started
Description NAME READY STATUS RESTARTS AGE jupyter-notebook-server-5f785cd7c8-x8qd6 1/1 Running 0 45m llm-playground-7d8c999487-fgmj5 1/1 Running 0 45m milvu-etcd-7cf545456f-m8q9m 1/1 Running 0 45m milvus-minio-7ff64c76f-4njkz 1/1 Running 0 45m milvus-standalone-7479bf9ddd-n6s6f 1/1 Running 0 45m query-router-65c6f864ff-fstkb 1/1 Running 0 45m triton-inference-server-7cd84c8f4b-wzsk9 0/1 CrashLoopBackOff 8 (18s ago) 23m
[triton-inference-server-7cd84c8f4b-wzsk9:30 :0:30] Caught signal 7 (Bus error: nonexistent physical address) backtrace (tid: 30) 0 0x0000000000042520 __sigaction() ???:0 1 0x000000000001678b uct_iface_mp_chunk_alloc_inner() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_mem.c:469 2 0x000000000001678b uct_iface_mp_chunk_alloc() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_mem.c:443 3 0x000000000005407b ucs_mpool_grow() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucs/datastruct/mpool.c:266 4 0x00000000000542c9 ucs_mpool_get_grow() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucs/datastruct/mpool.c:312 5 0x000000000001b488 uct_mm_iface_t_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/sm/mm/base/mm_iface.c:822 6 0x000000000001b9f2 uct_mm_iface_t_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/sm/mm/base/mm_iface.c:720 7 0x0000000000014f02 uct_iface_open() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_md.c:284 8 0x000000000004a017 ucp_worker_iface_open() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:1357 9 0x000000000004afe0 ucp_worker_add_resource_ifaces() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:1101 10 0x000000000004d2db ucp_worker_create() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:2441 11 0x000000000000702f mca_pml_ucx_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/ucx/pml_ucx.c:306 12 0x00000000000093a5 mca_pml_ucx_component_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/ucx/pml_ucx_component.c:136 13 0x00000000000c7022 mca_pml_base_select() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/base/pml_base_select.c:127 14 0x00000000000d01c9 ompi_mpi_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/runtime/ompi_mpi_init.c:647 15 0x0000000000075899 PMPI_Init_thread() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mpi/c/profile/pinit_thread.c:69 16 0x00000000000327a8 __pyx_f_6mpi4py_3MPI_bootstrap() /tmp/pip-install-05lukizf/mpi4py_8cc4cad65d414a8995a9d1c890fac173/src/mpi4py.MPI.c:8115 17 0x00000000000327a8 __pyx_pymod_exec_MPI() /tmp/pip-install-05lukizf/mpi4py_8cc4cad65d414a8995a9d1c890fac173/src/mpi4py.MPI.c:176976 18 0x000000000023b2d3 PyModule_ExecDef() ???:0 19 0x000000000023bda0 PyInit__thread() ???:0 20 0x000000000015f854 PyObject_GenericGetAttr() ???:0 21 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0 22 0x000000000016070c _PyFunction_Vectorcall() ???:0 23 0x000000000014e8a2 _PyEval_EvalFrameDefault() ???:0 24 0x000000000016070c _PyFunction_Vectorcall() ???:0 25 0x0000000000148f52 _PyEval_EvalFrameDefault() ???:0 26 0x000000000016070c _PyFunction_Vectorcall() ???:0 27 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0 28 0x000000000016070c _PyFunction_Vectorcall() ???:0 29 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0 30 0x000000000016070c _PyFunction_Vectorcall() ???:0 31 0x000000000015fb24 PyObject_CallFunctionObjArgs() ???:0 32 0x000000000023f4af _PyObject_CallMethodIdObjArgs() ???:0 33 0x00000000001740ca PyImport_ImportModuleLevelObject() ???:0 34 0x0000000000184458 PyImport_Import() ???:0 35 0x000000000015fe0e PyObject_CallFunctionObjArgs() ???:0 36 0x000000000016f12b PyObject_Call() ???:0 37 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0 38 0x000000000016070c _PyFunction_Vectorcall() ???:0 39 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0 40 0x000000000016070c _PyFunction_Vectorcall() ???:0 41 0x000000000015fb24 PyObject_CallFunctionObjArgs() ???:0 42 0x000000000023f4af _PyObject_CallMethodIdObjArgs() ???:0 43 0x0000000000174cda PyImport_ImportModuleLevelObject() ???:0 44 0x000000000014b9e5 _PyEval_EvalFrameDefault() ???:0 45 0x0000000000239e56 PyEval_EvalCode() ???:0 46 0x0000000000239cf6 PyEval_EvalCode() ???:0 47 0x000000000023fb0d PyFrozenSet_New() ???:0 48 0x0000000000160969 PyCell_New() ???:0 49 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0 50 0x000000000016070c _PyFunction_Vectorcall() ???:0 51 0x000000000014e8a2 _PyEval_EvalFrameDefault() ???:0 52 0x000000000016070c _PyFunction_Vectorcall() ???:0 53 0x0000000000148f52 _PyEval_EvalFrameDefault() ???:0 54 0x000000000016070c _PyFunction_Vectorcall() ???:0 55 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0 56 0x000000000016070c _PyFunction_Vectorcall() ???:0
[triton-inference-server-7cd84c8f4b-wzsk9:00030] *** Process received signal *** [triton-inference-server-7cd84c8f4b-wzsk9:00030] Signal: Bus error (7) [triton-inference-server-7cd84c8f4b-wzsk9:00030] Signal code: (-6) [triton-inference-server-7cd84c8f4b-wzsk9:00030] Failing at address: 0x1e [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9d7caa7520] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 1] /opt/hpcx/ucx/lib/libuct.so.0(uct_iface_mp_chunk_alloc+0x7b)[0x7f9d3689178b] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 2] /opt/hpcx/ucx/lib/libucs.so.0(ucs_mpool_grow+0x7b)[0x7f9d3691607b] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 3] /opt/hpcx/ucx/lib/libucs.so.0(ucs_mpool_get_grow+0x19)[0x7f9d369162c9] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 4] /opt/hpcx/ucx/lib/libuct.so.0(+0x1b488)[0x7f9d36896488] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 5] /opt/hpcx/ucx/lib/libuct.so.0(uct_mm_iface_t_new+0xb2)[0x7f9d368969f2] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 6] /opt/hpcx/ucx/lib/libuct.so.0(uct_iface_open+0xe2)[0x7f9d3688ff02] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 7] /opt/hpcx/ucx/lib/libucp.so.0(ucp_worker_iface_open+0x317)[0x7f9d36a93017] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 8] /opt/hpcx/ucx/lib/libucp.so.0(+0x4afe0)[0x7f9d36a93fe0] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 9] /opt/hpcx/ucx/lib/libucp.so.0(ucp_worker_create+0x7cb)[0x7f9d36a962db] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [10] /opt/hpcx/ompi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_init+0x9f)[0x7f9d36b2f02f] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [11] /opt/hpcx/ompi/lib/openmpi/mca_pml_ucx.so(+0x93a5)[0x7f9d36b313a5] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [12] /opt/hpcx/ompi/lib/libmpi.so.40(mca_pml_base_select+0x1e2)[0x7f9c1bc35022] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [13] /opt/hpcx/ompi/lib/libmpi.so.40(ompi_mpi_init+0x6c9)[0x7f9c1bc3e1c9] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [14] /opt/hpcx/ompi/lib/libmpi.so.40(PMPI_Init_thread+0x79)[0x7f9c1bbe3899] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [15] /usr/local/lib/python3.10/dist-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x327a8)[0x7f9c1bcbf7a8] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [16] /usr/bin/python3(PyModule_ExecDef+0x73)[0x55f3c471e2d3] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [17] /usr/bin/python3(+0x23bda0)[0x55f3c471eda0] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [18] /usr/bin/python3(+0x15f854)[0x55f3c4642854] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [19] /usr/bin/python3(_PyEval_EvalFrameDefault+0x2b71)[0x55f3c462e2c1] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [20] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [21] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6152)[0x55f3c46318a2] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [22] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [23] /usr/bin/python3(_PyEval_EvalFrameDefault+0x802)[0x55f3c462bf52] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [24] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [25] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6bd)[0x55f3c462be0d] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [26] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [27] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6bd)[0x55f3c462be0d] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [28] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c] [triton-inference-server-7cd84c8f4b-wzsk9:00030] [29] /usr/bin/python3(+0x15fb24)[0x55f3c4642b24] [triton-inference-server-7cd84c8f4b-wzsk9:00030] *** End of error message *** [23] May 29 04:16:21 [ ERROR] - main - TensorRT conversion returned a non-zero exit code.
Triton Information What version of Triton are you using?
Are you using the Triton container or did you build it yourself?
To Reproduce Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Expected behavior A clear and concise description of what you expected to happen.
Kindly fill the required information to reproduce the issue specifying the version of triton used.
Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.