tensorrtllm_backend
tensorrtllm_backend copied to clipboard
modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found
Description Trying to deploy Mistral-7B with Triton+TensorRT-LLM and running into this issue
Triton Information
Are you using the Triton container or did you build it yourself? nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3
To Reproduce Steps to reproduce the behavior.
Converted raw weights and built engine using instructions from https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#mistral-v01
Tested using run.py and inference works successfully.
Updated all the config.pbtxt files based on the guide https://github.com/triton-inference-server/tutorials/blob/main/Popular_Models_Guide/Llama2/trtllm_guide.md
Updated several other parameters in the config.pbtxt files which weren't addressed in the instructions.
Start up the docker container - docker run --rm -it --net host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /home/ubuntu/tensorrtllm_backend:/tensorrtllm_backend -v /home/ubuntu/hf_mistral_weights:/hf_mistral_weights -v /home/ubuntu/tmp/mistral/7B/trt_engines/fp16/1-gpu/:/engines nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3
copy the inflight_batcher files using the command cp -R /tensorrtllm_backend/all_models/inflight_batcher_llm /opt/tritonserver/.
launch the triton server using python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=/opt/tritonserver/inflight_batcher_llm
Fails with error
I0214 02:34:39.939687 128 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864 I0214 02:34:39.939702 128 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864 I0214 02:34:39.939706 128 cuda_memory_manager.cc:107] CUDA memory pool is created on device 2 with size 67108864 I0214 02:34:39.939710 128 cuda_memory_manager.cc:107] CUDA memory pool is created on device 3 with size 67108864 W0214 02:34:40.148747 128 server.cc:238] failed to enable peer access for some device pairs I0214 02:34:40.151177 128 model_lifecycle.cc:461] loading: postprocessing:1 I0214 02:34:40.151282 128 model_lifecycle.cc:461] loading: preprocessing:1 I0214 02:34:40.151367 128 model_lifecycle.cc:461] loading: tensorrt_llm:1 I0214 02:34:40.151416 128 model_lifecycle.cc:461] loading: tensorrt_llm_bls:1 I0214 02:34:40.161061 128 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0) I0214 02:34:40.161084 128 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0) I0214 02:34:40.211360 128 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_bls_0_0 (CPU device 0) [TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value [TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict) [TensorRT-LLM][WARNING] kv_cache_free_gpu_mem_fraction is not specified, will use default value of 0.85 or max_tokens_in_paged_kv_cache [TensorRT-LLM][WARNING] max_num_sequences is not specified, will be set to the TRT engine max_batch_size [TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true E0214 02:34:40.211909 128 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found E0214 02:34:40.211962 128 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found I0214 02:34:40.211974 128 model_lifecycle.cc:756] failed to load 'tensorrt_llm' I0214 02:34:40.455701 128 model_lifecycle.cc:818] successfully loaded 'tensorrt_llm_bls' I0214 02:34:40.713354 128 model_lifecycle.cc:818] successfully loaded 'postprocessing' I0214 02:34:40.716190 128 model_lifecycle.cc:818] successfully loaded 'preprocessing' E0214 02:34:40.716281 128 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found; I0214 02:34:40.716342 128 server.cc:592] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+
I0214 02:34:40.716400 128 server.cc:619] +-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------+ | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","m | | | | in-compute-capability":"6.000000","shm-region-prefix-name":"prefix0_","default-max-batch-size" | | | | :"4"}} | | tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","m | | | | in-compute-capability":"6.000000","default-max-batch-size":"4"}} | +-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------+
I0214 02:34:40.716454 128 server.cc:662] +------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+ | Model | Version | Status | +------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+ | postprocessing | 1 | READY | | preprocessing | 1 | READY | | tensorrt_llm | 1 | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'builder_config' not found | | tensorrt_llm_bls | 1 | READY | +------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------+
I0214 02:34:40.794660 128 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA A10G I0214 02:34:40.794698 128 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA A10G I0214 02:34:40.794704 128 metrics.cc:817] Collecting metrics for GPU 2: NVIDIA A10G I0214 02:34:40.794710 128 metrics.cc:817] Collecting metrics for GPU 3: NVIDIA A10G I0214 02:34:40.794943 128 metrics.cc:710] Collecting CPU metrics I0214 02:34:40.795147 128 tritonserver.cc:2458] +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.39.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_s | | | hared_memory binary_tensor_data parameters statistics trace logging | | model_repository_path[0] | /opt/tritonserver/inflight_batcher_llm | | model_control_mode | MODE_NONE | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | cuda_memory_pool_byte_size{1} | 67108864 | | cuda_memory_pool_byte_size{2} | 67108864 | | cuda_memory_pool_byte_size{3} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
I0214 02:34:40.795170 128 server.cc:293] Waiting for in-flight requests to complete. I0214 02:34:40.795178 128 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences I0214 02:34:40.795629 128 server.cc:324] All models are stopped, unloading models I0214 02:34:40.795646 128 server.cc:331] Timeout 30: Found 3 live models and 0 in-flight non-inference requests I0214 02:34:41.795733 128 server.cc:331] Timeout 29: Found 3 live models and 0 in-flight non-inference requests Cleaning up... Cleaning up... Cleaning up... I0214 02:34:41.831995 128 model_lifecycle.cc:603] successfully unloaded 'tensorrt_llm_bls' version 1 I0214 02:34:42.045779 128 model_lifecycle.cc:603] successfully unloaded 'postprocessing' version 1 I0214 02:34:42.054175 128 model_lifecycle.cc:603] successfully unloaded 'preprocessing' version 1 I0214 02:34:42.795816 128 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests error: creating server: Internal - failed to load all models
Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[16837,1],0] Exit code: 1
Expected behavior Model should be deployed successfully and be ready for inference