QAnything [BUG] <title>terminate called after throwing an instance of 'std::runtime_error' what(): [FT][ERROR] Assertion fail: /home/local/llama_inference/llm_serve/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/triton

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

bash run.sh一直等待，很久之后提示启动Triton服务超时，请进入容器内检查model_reposQAEnsemble_baseQAEnsemble_base.log以获取更多信息。然后去models下面查看QAEnsemble.log文件，发现报错

期望行为 | Expected Behavior

bash run.sh能正常启动容器，并能访问5052

运行环境 | Environment

OS: Windows 11 WSL2(ubuntu 22.04)
NVIDIA Driver: 546.65
CUDA: 12.3
docker desktop: v4.26.1
NVIDIA GPU Memory: 8GB

QAnything日志 | QAnything logs

I0123 10:46:22.146296 91 cache_manager.cc:478] Create CacheManager with cache_dir: '/opt/tritonserver/caches' I0123 10:46:23.234552 91 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x205000000' with size 268435456 I0123 10:46:23.235645 91 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0123 10:46:27.309790 91 model_config_utils.cc:647] Server side auto-completed config: name: "base" max_batch_size: 1 input { name: "input_ids" data_type: TYPE_UINT32 dims: -1 allow_ragged_batch: true } input { name: "start_id" data_type: TYPE_UINT32 dims: 1 reshape { } optional: true } input { name: "end_id" data_type: TYPE_UINT32 dims: 1 reshape { } optional: true } input { name: "input_lengths" data_type: TYPE_UINT32 dims: 1 reshape { } } input { name: "request_output_len" data_type: TYPE_UINT32 dims: -1 } input { name: "runtime_top_k" data_type: TYPE_UINT32 dims: 1 reshape { } optional: true } input { name: "runtime_top_p" data_type: TYPE_FP32 dims: 1 reshape { } optional: true } input { name: "beam_search_diversity_rate" data_type: TYPE_FP32 dims: 1 reshape { } optional: true } input { name: "temperature" data_type: TYPE_FP32 dims: 1 reshape { } optional: true } input { name: "len_penalty" data_type: TYPE_FP32 dims: 1 reshape { } optional: true } input { name: "repetition_penalty" data_type: TYPE_FP32 dims: 1 reshape { } optional: true } input { name: "random_seed" data_type: TYPE_UINT64 dims: 1 reshape { } optional: true } input { name: "is_return_log_probs" data_type: TYPE_BOOL dims: 1 reshape { } optional: true } input { name: "beam_width" data_type: TYPE_UINT32 dims: 1 reshape { } optional: true } input { name: "bad_words_list" data_type: TYPE_INT32 dims: 2 dims: -1 optional: true } input { name: "stop_words_list" data_type: TYPE_INT32 dims: 2 dims: -1 optional: true } input { name: "prompt_learning_task_name_ids" data_type: TYPE_UINT32 dims: 1 reshape { } optional: true } input { name: "top_p_decay" data_type: TYPE_FP32 dims: 1 reshape { } optional: true } input { name: "top_p_min" data_type: TYPE_FP32 dims: 1 reshape { } optional: true } input { name: "top_p_reset_ids" data_type: TYPE_UINT32 dims: 1 reshape { } optional: true } output { name: "output_ids" data_type: TYPE_UINT32 dims: -1 dims: -1 } output { name: "sequence_length" data_type: TYPE_UINT32 dims: -1 } output { name: "cum_log_probs" data_type: TYPE_FP32 dims: -1 } output { name: "output_log_probs" data_type: TYPE_FP32 dims: -1 dims: -1 } instance_group { count: 1 kind: KIND_CPU } default_model_filename: "base" dynamic_batching { max_queue_delay_microseconds: 50000 } parameters { key: "data_type" value { string_value: "fp16" } } parameters { key: "enable_custom_all_reduce" value { string_value: "0" } } parameters { key: "int8_mode" value { string_value: "1" } } parameters { key: "model_checkpoint_path" value { string_value: "/model_repos/QAEnsemble/base/1/1-gpu/" } } parameters { key: "model_type" value { string_value: "Llama" } } parameters { key: "pipeline_para_size" value { string_value: "1" } } parameters { key: "tensor_para_size" value { string_value: "1" } } backend: "qa_ensemble" model_transaction_policy { decoupled: true } batch_input { kind: BATCH_ITEM_SHAPE target_name: "input_ids_item_shape" data_type: TYPE_INT32 source_input: "input_ids" }

I0123 10:46:27.405802 91 model_config_utils.cc:647] Server side auto-completed config: name: "embed" platform: "onnxruntime_onnx" max_batch_size: 16 input { name: "token_type_ids" data_type: TYPE_INT64 dims: -1 } input { name: "attention_mask" data_type: TYPE_INT64 dims: -1 } input { name: "input_ids" data_type: TYPE_INT64 dims: -1 } output { name: "1607" data_type: TYPE_FP32 dims: 768 } output { name: "output" data_type: TYPE_FP32 dims: -1 dims: 768 } instance_group { name: "embed" count: 1 gpus: 0 kind: KIND_GPU } default_model_filename: "model.onnx" backend: "onnxruntime"

I0123 10:46:27.506715 91 model_config_utils.cc:647] Server side auto-completed config: name: "rerank" platform: "onnxruntime_onnx" max_batch_size: 16 input { name: "attention_mask" data_type: TYPE_INT64 dims: -1 } input { name: "token_type_ids" data_type: TYPE_INT64 dims: -1 } input { name: "input_ids" data_type: TYPE_INT64 dims: -1 } output { name: "logits" data_type: TYPE_FP16 dims: 1 } instance_group { name: "rerank" count: 1 gpus: 0 kind: KIND_GPU } default_model_filename: "model.onnx" backend: "onnxruntime"

I0123 10:46:27.514646 91 model_lifecycle.cc:462] loading: rerank:1 I0123 10:46:27.521593 91 backend_model.cc:362] Adding default backend config setting: default-max-batch-size,4 I0123 10:46:27.522299 91 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so I0123 10:46:27.524456 91 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime I0123 10:46:27.525273 91 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12 I0123 10:46:27.526272 91 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12 I0123 10:46:27.526966 91 model_lifecycle.cc:462] loading: embed:1 I0123 10:46:27.527105 91 onnxruntime.cc:2550] backend configuration: {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} I0123 10:46:27.532334 91 backend_model.cc:362] Adding default backend config setting: default-max-batch-size,4 I0123 10:46:27.538006 91 model_lifecycle.cc:462] loading: base:1 I0123 10:46:27.543958 91 backend_model.cc:362] Adding default backend config setting: default-max-batch-size,4 I0123 10:46:27.544678 91 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1) I0123 10:46:27.544679 91 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1) I0123 10:46:27.544696 91 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so I0123 10:46:27.546190 91 model_config_utils.cc:1839] ModelConfig 64-bit fields: I0123 10:46:27.548000 91 model_config_utils.cc:1841] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds I0123 10:46:27.548654 91 model_config_utils.cc:1841] ModelConfig::dynamic_batching::max_queue_delay_microseconds I0123 10:46:27.549343 91 model_config_utils.cc:1841] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds I0123 10:46:27.550300 91 model_config_utils.cc:1841] ModelConfig::ensemble_scheduling::step::model_version I0123 10:46:27.551567 91 model_config_utils.cc:1841] ModelConfig::input::dims I0123 10:46:27.552558 91 model_config_utils.cc:1841] ModelConfig::input::reshape::shape I0123 10:46:27.553423 91 model_config_utils.cc:1841] ModelConfig::instance_group::secondary_devices::device_id I0123 10:46:27.554198 91 model_config_utils.cc:1841] ModelConfig::model_warmup::inputs::value::dims I0123 10:46:27.554934 91 model_config_utils.cc:1841] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim I0123 10:46:27.555914 91 model_config_utils.cc:1841] ModelConfig::optimization::cuda::graph_spec::input::value::dim I0123 10:46:27.556738 91 model_config_utils.cc:1841] ModelConfig::output::dims I0123 10:46:27.557425 91 model_config_utils.cc:1841] ModelConfig::output::reshape::shape I0123 10:46:27.558244 91 model_config_utils.cc:1841] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds I0123 10:46:27.559788 91 model_config_utils.cc:1841] ModelConfig::sequence_batching::max_sequence_idle_microseconds I0123 10:46:27.561145 91 model_config_utils.cc:1841] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds I0123 10:46:27.562226 91 model_config_utils.cc:1841] ModelConfig::sequence_batching::state::dims I0123 10:46:27.563091 91 model_config_utils.cc:1841] ModelConfig::sequence_batching::state::initial_state::dims I0123 10:46:27.564123 91 model_config_utils.cc:1841] ModelConfig::version_policy::specific::versions I0123 10:46:27.565304 91 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified I0123 10:46:27.565324 91 onnxruntime.cc:666] skipping model configuration auto-complete for 'rerank': inputs and outputs already specified I0123 10:46:27.571452 91 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: embed (GPU device 0) I0123 10:46:27.572770 91 backend_model_instance.cc:105] Creating instance embed on GPU 0 (8.9) using artifact 'model.onnx' I0123 10:46:27.572862 91 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0) I0123 10:46:27.574404 91 backend_model_instance.cc:105] Creating instance rerank on GPU 0 (8.9) using artifact 'model.onnx' I0123 10:46:27.609262 91 onnxruntime.cc:553] CUDA Execution Accelerator is set for 'embed' on device 0 2024-01-23 18:46:27.610434750 [I:onnxruntime:, inference_session.cc:271 operator()] Flush-to-zero and denormal-as-zero are off 2024-01-23 18:46:27.611009267 [I:onnxruntime:, inference_session.cc:279 ConstructorCommon] Creating and using per session threadpools since use_per_session_threads_ is true 2024-01-23 18:46:27.611487659 [I:onnxruntime:, inference_session.cc:297 ConstructorCommon] Dynamic block base set to 0 I0123 10:46:27.611815 91 onnxruntime.cc:553] CUDA Execution Accelerator is set for 'rerank' on device 0 2024-01-23 18:46:27.613208123 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 680, index: 0, mask: {2, 3, } 2024-01-23 18:46:27.613278646 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 681, index: 1, mask: {4, 5, } 2024-01-23 18:46:27.613498790 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 682, index: 2, mask: {6, 7, } 2024-01-23 18:46:27.613506331 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 683, index: 3, mask: {8, 9, } 2024-01-23 18:46:27.613572767 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 684, index: 4, mask: {10, 11, } 2024-01-23 18:46:27.614362624 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 685, index: 5, mask: {12, 13, } 2024-01-23 18:46:27.614924025 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 686, index: 6, mask: {14, 15, } 2024-01-23 18:46:27.615095580 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 687, index: 7, mask: {16, 17, } 2024-01-23 18:46:27.625392156 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 690, index: 10, mask: {22, 23, } 2024-01-23 18:46:27.625307692 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 689, index: 9, mask: {20, 21, } 2024-01-23 18:46:27.625455150 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 688, index: 8, mask: {18, 19, } 2024-01-23 18:46:27.625564354 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 691, index: 11, mask: {24, 25, } 2024-01-23 18:46:27.625701625 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 693, index: 13, mask: {28, 29, } 2024-01-23 18:46:27.625727190 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 694, index: 14, mask: {30, 31, } 2024-01-23 18:46:27.625805541 [V:onnxruntime:log, env.cc:246 ThreadMain] pthread_setaffinity_np succeed for thread: 692, index: 12, mask: {26, 27, } I0123 10:46:27.960071 91 libfastertransformer.cc:1848] TRITONBACKEND_Initialize: qa_ensemble I0123 10:46:27.961065 91 libfastertransformer.cc:1858] Triton TRITONBACKEND API version: 1.12 I0123 10:46:27.961748 91 libfastertransformer.cc:1864] 'qa_ensemble' TRITONBACKEND API version: 1.10 I0123 10:46:27.962404 91 libfastertransformer.cc:1896] TRITONBACKEND_ModelInitialize: base (version 1) I0123 10:46:27.964034 91 libfastertransformer.cc:364] model configuration: { "name": "base", "platform": "", "backend": "qa_ensemble", "version_policy": { "latest": { "num_versions": 1 } }, "max_batch_size": 1, "input": [ { "name": "input_ids", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": true, "optional": false }, { "name": "start_id", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "end_id", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "input_lengths", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "request_output_len", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": false }, { "name": "runtime_top_k", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "runtime_top_p", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "beam_search_diversity_rate", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "temperature", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "len_penalty", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "repetition_penalty", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "random_seed", "data_type": "TYPE_UINT64", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "is_return_log_probs", "data_type": "TYPE_BOOL", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "beam_width", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "bad_words_list", "data_type": "TYPE_INT32", "format": "FORMAT_NONE", "dims": [ 2, -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "stop_words_list", "data_type": "TYPE_INT32", "format": "FORMAT_NONE", "dims": [ 2, -1 ], "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "prompt_learning_task_name_ids", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "top_p_decay", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "top_p_min", "data_type": "TYPE_FP32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true }, { "name": "top_p_reset_ids", "data_type": "TYPE_UINT32", "format": "FORMAT_NONE", "dims": [ 1 ], "reshape": { "shape": [] }, "is_shape_tensor": false, "allow_ragged_batch": false, "optional": true } ], "output": [ { "name": "output_ids", "data_type": "TYPE_UINT32", "dims": [ -1, -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "sequence_length", "data_type": "TYPE_UINT32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "cum_log_probs", "data_type": "TYPE_FP32", "dims": [ -1 ], "label_filename": "", "is_shape_tensor": false }, { "name": "output_log_probs", "data_type": "TYPE_FP32", "dims": [ -1, -1 ], "label_filename": "", "is_shape_tensor": false } ], "batch_input": [ { "kind": "BATCH_ITEM_SHAPE", "target_name": [ "input_ids_item_shape" ], "data_type": "TYPE_INT32", "source_input": [ "input_ids" ] } ], "batch_output": [], "optimization": { "priority": "PRIORITY_DEFAULT", "input_pinned_memory": { "enable": true }, "output_pinned_memory": { "enable": true }, "gather_kernel_buffer_threshold": 0, "eager_batching": false }, "dynamic_batching": { "preferred_batch_size": [ 1 ], "max_queue_delay_microseconds": 50000, "preserve_ordering": false, "priority_levels": 0, "default_priority_level": 0, "priority_queue_policy": {} }, "instance_group": [ { "name": "base_0", "kind": "KIND_CPU", "count": 1, "gpus": [], "secondary_devices": [], "profile": [], "passive": false, "host_policy": "" } ], "default_model_filename": "base", "cc_model_filenames": {}, "metric_tags": {}, "parameters": { "model_type": { "string_value": "Llama" }, "pipeline_para_size": { "string_value": "1" }, "data_type": { "string_value": "fp16" }, "model_checkpoint_path": { "string_value": "/model_repos/QAEnsemble/base/1/1-gpu/" }, "int8_mode": { "string_value": "1" }, "tensor_para_size": { "string_value": "1" }, "enable_custom_all_reduce": { "string_value": "0" } }, "model_warmup": [], "model_transaction_policy": { "decoupled": true } } I0123 10:46:27.965026 91 libfastertransformer.cc:387] Instance group type: KIND_CPU count: 1 I0123 10:46:27.965873 91 libfastertransformer.cc:417] Sequence Batching: disabled I0123 10:46:27.966620 91 libfastertransformer.cc:427] Dynamic Batching: enabled terminate called after throwing an instance of 'std::runtime_error' what(): [FT][ERROR] Assertion fail: /home/local/llama_inference/llm_serve/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/triton_backend/llama/LlamaTritonModel.cc:89

[65ef5ebd6c73:00091] *** Process received signal *** [65ef5ebd6c73:00091] Signal: Aborted (6) [65ef5ebd6c73:00091] Signal code: (-6) [65ef5ebd6c73:00091] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f8a2ce94520] [65ef5ebd6c73:00091] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f8a2cee89fc] [65ef5ebd6c73:00091] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f8a2ce94476] [65ef5ebd6c73:00091] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f8a2ce7a7f3] [65ef5ebd6c73:00091] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f8a2d11db9e] [65ef5ebd6c73:00091] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f8a2d12920c] [65ef5ebd6c73:00091] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7f8a2d129277] [65ef5ebd6c73:00091] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7f8a2d1294d8] [65ef5ebd6c73:00091] [ 8] /opt/tritonserver/backends/qa_ensemble/libtransformer-shared.so(_ZN17fastertransformer17throwRuntimeErrorEPKciRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x27f)[0x7f89e37c4c5f] [65ef5ebd6c73:00091] [ 9] /opt/tritonserver/backends/qa_ensemble/libtransformer-shared.so(_ZN16LlamaTritonModelI6__halfEC1EmmiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEi+0x70e)[0x7f89e3a9fdae] [65ef5ebd6c73:00091] [10] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(+0x275ca)[0x7f8a13be05ca] [65ef5ebd6c73:00091] [11] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(+0x2f0d9)[0x7f8a13be80d9] [65ef5ebd6c73:00091] [12] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(+0x35f56)[0x7f8a13beef56] [65ef5ebd6c73:00091] [13] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(TRITONBACKEND_ModelInitialize+0x4c2)[0x7f8a13bef732] [65ef5ebd6c73:00091] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x174497)[0x7f8a2d85e497] [65ef5ebd6c73:00091] [15] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x252d80)[0x7f8a2d93cd80] [65ef5ebd6c73:00091] [16] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x2563c3)[0x7f8a2d9403c3] [65ef5ebd6c73:00091] [17] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x39a052)[0x7f8a2da84052] [65ef5ebd6c73:00091] [18] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253)[0x7f8a2d157253] [65ef5ebd6c73:00091] [19] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f8a2cee6ac3] [65ef5ebd6c73:00091] [20] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f8a2cf78850] [65ef5ebd6c73:00091] *** End of error message ***

复现方法 | Steps To Reproduce

terminate called after throwing an instance of 'std::runtime_error' what(): [FT][ERROR] Assertion fail: /home/local/llama_inference/llm_serve/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/triton_backend/llama/LlamaTritonModel.cc:89

[65ef5ebd6c73:00091] *** Process received signal *** [65ef5ebd6c73:00091] Signal: Aborted (6) [65ef5ebd6c73:00091] Signal code: (-6) [65ef5ebd6c73:00091] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f8a2ce94520] [65ef5ebd6c73:00091] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f8a2cee89fc] [65ef5ebd6c73:00091] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f8a2ce94476] [65ef5ebd6c73:00091] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f8a2ce7a7f3] [65ef5ebd6c73:00091] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f8a2d11db9e] [65ef5ebd6c73:00091] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f8a2d12920c] [65ef5ebd6c73:00091] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7f8a2d129277] [65ef5ebd6c73:00091] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7f8a2d1294d8] [65ef5ebd6c73:00091] [ 8] /opt/tritonserver/backends/qa_ensemble/libtransformer-shared.so(_ZN17fastertransformer17throwRuntimeErrorEPKciRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x27f)[0x7f89e37c4c5f] [65ef5ebd6c73:00091] [ 9] /opt/tritonserver/backends/qa_ensemble/libtransformer-shared.so(_ZN16LlamaTritonModelI6__halfEC1EmmiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEi+0x70e)[0x7f89e3a9fdae] [65ef5ebd6c73:00091] [10] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(+0x275ca)[0x7f8a13be05ca] [65ef5ebd6c73:00091] [11] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(+0x2f0d9)[0x7f8a13be80d9] [65ef5ebd6c73:00091] [12] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(+0x35f56)[0x7f8a13beef56] [65ef5ebd6c73:00091] [13] /opt/tritonserver/backends/qa_ensemble/libtriton_qa_ensemble.so(TRITONBACKEND_ModelInitialize+0x4c2)[0x7f8a13bef732] [65ef5ebd6c73:00091] [14] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x174497)[0x7f8a2d85e497] [65ef5ebd6c73:00091] [15] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x252d80)[0x7f8a2d93cd80] [65ef5ebd6c73:00091] [16] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x2563c3)[0x7f8a2d9403c3] [65ef5ebd6c73:00091] [17] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x39a052)[0x7f8a2da84052] [65ef5ebd6c73:00091] [18] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253)[0x7f8a2d157253] [65ef5ebd6c73:00091] [19] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f8a2cee6ac3] [65ef5ebd6c73:00091] [20] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850)[0x7f8a2cf78850] [65ef5ebd6c73:00091] *** End of error message ***

备注 | Anything else?

No response

Jan 23 '24 11:01 highinsky

同样的问题，怎么解决

Jan 25 '24 02:01 thunder95

同样的问题

Jan 25 '24 08:01 baichuan-assistant

QAnything
QAnything copied to clipboard

[BUG] <title>terminate called after throwing an instance of 'std::runtime_error' what(): [FT][ERROR] Assertion fail: /home/local/llama_inference/llm_serve/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/triton_backend/llama/LlamaTritonModel.cc:89

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

运行环境 | Environment

QAnything日志 | QAnything logs

复现方法 | Steps To Reproduce

备注 | Anything else?

QAnything QAnything copied to clipboard

[BUG] <title>terminate called after throwing an instance of 'std::runtime_error' what(): [FT][ERROR] Assertion fail: /home/local/llama_inference/llm_serve/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/triton_backend/llama/LlamaTritonModel.cc:89

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

运行环境 | Environment

QAnything日志 | QAnything logs

复现方法 | Steps To Reproduce

备注 | Anything else?

QAnything
QAnything copied to clipboard