CUDA runtime error in cudaDeviceGetDefaultMemPool
Description Trying to deploy a HugginFace model, which I successfully converted with TensorRT-LLM (i.e. inference with model engines works in the TRT-LLM container), in Triton Server with tensorrtllm_backend, I always get a CUDA runtime error in cudaDeviceGetDefaultMemPool.
System Ubuntu 22.04.4 LTS Driver Version: 535.104.05 CUDA Version: 12.2
➜ tensorrtllm_backend git:(release/0.5.0) ✗ nvidia-smi
Tue Mar 5 19:47:15 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 GRID A100DX-80C On | 00000000:02:04.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 GRID A100DX-80C On | 00000000:02:05.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 GRID A100DX-80C On | 00000000:02:06.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 GRID A100DX-80C On | 00000000:02:07.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
Triton Information nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 Tried both the container from NGC and building it from the tensorrtllm_backend repo. Same behavior. I also tried newer versions and different driver/CUDA versions, always same behavior.
To Reproduce I followed exactly this: https://developer.nvidia.com/blog/optimizing-inference-on-llms-with-tensorrt-llm-now-publicly-available/ with Llama-2-13b-chat-hf. Configuration etc. from the tutorial. Running the model engines in the TensorRT-LLM container works fine (I can see activity on all 4 GPUs when I call nvidia-smi during inference). When I try to run Triton server I get the following error:
Expected behavior Something like this:
+----------------------+---------+--------+
| Model | Version | Status |
+----------------------+---------+--------+
| <model_name> | <v> | READY |
| .. | . | .. |
| .. | . | .. |
+----------------------+---------+--------+
...
Actual behavior
root@hal3000:/opt/tritonserver# python /opt/scripts/launch_triton_server.py --model_repo /all_models/inflight_batcher_llm --world_size 4
root@hal3000:/opt/tritonserver# I0305 19:29:53.282831 117 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x10020000000' with size 268435456
I0305 19:29:53.289096 116 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x10020000000' with size 268435456
I0305 19:29:53.295001 119 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x10020000000' with size 268435456
I0305 19:29:53.300931 118 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x10020000000' with size 268435456
I0305 19:29:53.306811 117 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0305 19:29:53.306831 117 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I0305 19:29:53.306834 117 cuda_memory_manager.cc:107] CUDA memory pool is created on device 2 with size 67108864
I0305 19:29:53.306836 117 cuda_memory_manager.cc:107] CUDA memory pool is created on device 3 with size 67108864
I0305 19:29:53.306827 116 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0305 19:29:53.306834 116 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I0305 19:29:53.306836 116 cuda_memory_manager.cc:107] CUDA memory pool is created on device 2 with size 67108864
I0305 19:29:53.306838 116 cuda_memory_manager.cc:107] CUDA memory pool is created on device 3 with size 67108864
I0305 19:29:53.306867 119 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0305 19:29:53.306876 119 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I0305 19:29:53.306878 119 cuda_memory_manager.cc:107] CUDA memory pool is created on device 2 with size 67108864
I0305 19:29:53.306880 119 cuda_memory_manager.cc:107] CUDA memory pool is created on device 3 with size 67108864
I0305 19:29:53.307080 118 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0305 19:29:53.307099 118 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I0305 19:29:53.307101 118 cuda_memory_manager.cc:107] CUDA memory pool is created on device 2 with size 67108864
I0305 19:29:53.307103 118 cuda_memory_manager.cc:107] CUDA memory pool is created on device 3 with size 67108864
I0305 19:29:54.379254 117 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0305 19:29:54.379321 117 model_lifecycle.cc:461] loading: preprocessing:1
I0305 19:29:54.379370 117 model_lifecycle.cc:461] loading: postprocessing:1
I0305 19:29:54.393911 116 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0305 19:29:54.393956 116 model_lifecycle.cc:461] loading: preprocessing:1
I0305 19:29:54.393986 116 model_lifecycle.cc:461] loading: postprocessing:1
I0305 19:29:54.398459 119 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0305 19:29:54.398457 118 model_lifecycle.cc:461] loading: tensorrt_llm:1
I0305 19:29:54.398534 119 model_lifecycle.cc:461] loading: preprocessing:1
I0305 19:29:54.398537 118 model_lifecycle.cc:461] loading: preprocessing:1
I0305 19:29:54.398585 118 model_lifecycle.cc:461] loading: postprocessing:1
I0305 19:29:54.398615 119 model_lifecycle.cc:461] loading: postprocessing:1
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null
[TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
I0305 19:29:54.423351 117 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
I0305 19:29:54.423376 117 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null
[TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
I0305 19:29:54.433686 116 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
I0305 19:29:54.433683 116 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null
[TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set.
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] batch_scheduler_policy parameter was not found or is invalid (must be max_utilization or guaranteed_no_evict)
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null
[TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
I0305 19:29:54.447732 119 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I0305 19:29:54.447770 119 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
I0305 19:29:54.448162 118 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I0305 19:29:54.448289 118 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
[TensorRT-LLM][INFO] MPI size: 4, rank: 3
[TensorRT-LLM][INFO] MPI size: 4, rank: 0
[TensorRT-LLM][INFO] MPI size: 4, rank: 1
[TensorRT-LLM][INFO] MPI size: 4, rank: 2
Downloading tokenizer_config.json: 100%|██████████| 1.62k/1.62k [00:00<00:00, 7.54MB/s]
Downloading tokenizer.model: 100%|██████████| 500k/500k [00:00<00:00, 8.91MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 414/414 [00:00<00:00, 3.33MB/s]
Downloading tokenizer.json: 100%|██████████| 1.84M/1.84M [00:00<00:00, 10.1MB/s]
I0305 19:29:56.596386 117 model_lifecycle.cc:818] successfully loaded 'postprocessing'
I0305 19:29:56.613390 118 model_lifecycle.cc:818] successfully loaded 'postprocessing'
I0305 19:29:56.616850 119 model_lifecycle.cc:818] successfully loaded 'postprocessing'
I0305 19:29:56.627085 117 model_lifecycle.cc:818] successfully loaded 'preprocessing'
I0305 19:29:56.628228 116 model_lifecycle.cc:818] successfully loaded 'preprocessing'
I0305 19:29:56.666116 118 model_lifecycle.cc:818] successfully loaded 'preprocessing'
I0305 19:29:56.732012 119 model_lifecycle.cc:818] successfully loaded 'preprocessing'
I0305 19:29:56.793750 116 model_lifecycle.cc:818] successfully loaded 'postprocessing'
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 4
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 64
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 1
E0305 19:29:57.029880 119 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1 0x7f981281e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f981281e045]
2 0x7f9812873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f9812873925]
3 0x7f98128747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f98128747bf]
4 0x7f98128d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f98128d5545]
5 0x7f981284ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f981284ef4e]
6 0x7f981283ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f981283ec0c]
7 0x7f98128395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f98128395f5]
8 0x7f98128374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f98128374db]
9 0x7f981281b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f981281b182]
10 0x7f981281b235 TRITONBACKEND_ModelInstanceInitialize + 101
11 0x7f989ab9aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f989ab9aa86]
12 0x7f989ab9bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f989ab9bcc6]
13 0x7f989ab7ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f989ab7ec15]
14 0x7f989ab7f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f989ab7f256]
15 0x7f989ab8b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f989ab8b27d]
16 0x7f989a1f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f989a1f9ee8]
17 0x7f989ab7597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f989ab7597b]
18 0x7f989ab85695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f989ab85695]
19 0x7f989ab8a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f989ab8a50b]
20 0x7f989ac73610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f989ac73610]
21 0x7f989ac76d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f989ac76d03]
22 0x7f989adc38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f989adc38b2]
23 0x7f989a464253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f989a464253]
24 0x7f989a1f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f989a1f4ac3]
25 0x7f989a285bf4 clone + 68
E0305 19:29:57.030002 119 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1 0x7f981281e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f981281e045]
2 0x7f9812873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f9812873925]
3 0x7f98128747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f98128747bf]
4 0x7f98128d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f98128d5545]
5 0x7f981284ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f981284ef4e]
6 0x7f981283ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f981283ec0c]
7 0x7f98128395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f98128395f5]
8 0x7f98128374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f98128374db]
9 0x7f981281b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f981281b182]
10 0x7f981281b235 TRITONBACKEND_ModelInstanceInitialize + 101
11 0x7f989ab9aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f989ab9aa86]
12 0x7f989ab9bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f989ab9bcc6]
13 0x7f989ab7ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f989ab7ec15]
14 0x7f989ab7f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f989ab7f256]
15 0x7f989ab8b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f989ab8b27d]
16 0x7f989a1f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f989a1f9ee8]
17 0x7f989ab7597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f989ab7597b]
18 0x7f989ab85695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f989ab85695]
19 0x7f989ab8a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f989ab8a50b]
20 0x7f989ac73610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f989ac73610]
21 0x7f989ac76d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f989ac76d03]
22 0x7f989adc38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f989adc38b2]
23 0x7f989a464253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f989a464253]
24 0x7f989a1f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f989a1f4ac3]
25 0x7f989a285bf4 clone + 68
I0305 19:29:57.030034 119 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
E0305 19:29:57.030114 119 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1 0x7f981281e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f981281e045]
2 0x7f9812873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f9812873925]
3 0x7f98128747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f98128747bf]
4 0x7f98128d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f98128d5545]
5 0x7f981284ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f981284ef4e]
6 0x7f981283ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f981283ec0c]
7 0x7f98128395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f98128395f5]
8 0x7f98128374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f98128374db]
9 0x7f981281b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f981281b182]
10 0x7f981281b235 TRITONBACKEND_ModelInstanceInitialize + 101
11 0x7f989ab9aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f989ab9aa86]
12 0x7f989ab9bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f989ab9bcc6]
13 0x7f989ab7ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f989ab7ec15]
14 0x7f989ab7f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f989ab7f256]
15 0x7f989ab8b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f989ab8b27d]
16 0x7f989a1f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f989a1f9ee8]
17 0x7f989ab7597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f989ab7597b]
18 0x7f989ab85695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f989ab85695]
19 0x7f989ab8a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f989ab8a50b]
20 0x7f989ac73610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f989ac73610]
21 0x7f989ac76d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f989ac76d03]
22 0x7f989adc38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f989adc38b2]
23 0x7f989a464253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f989a464253]
24 0x7f989a1f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f989a1f4ac3]
25 0x7f989a285bf4 clone + 68;
I0305 19:29:57.030195 119 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0305 19:29:57.030246 119 server.cc:619]
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------+
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.0000 |
| | | 00","default-max-batch-size":"4"}} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.0000 |
| | | 00","shm-region-prefix-name":"prefix3_","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------+
I0305 19:29:57.030338 119 server.cc:662]
+----------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+----------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing | 1 | READY |
| preprocessing | 1 | READY |
| tensorrt_llm | 1 | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation no |
| | | t supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170) |
| | | 1 0x7f981281e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f981281e045] |
| | | 2 0x7f9812873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f9812873925] |
| | | 3 0x7f98128747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f98128747bf] |
| | | 4 0x7f98128d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f98128d5545] |
| | | 5 0x7f981284ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f981284ef4e] |
| | | 6 0x7f981283ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f981283ec0c] |
| | | 7 0x7f98128395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f98128395f5] |
| | | 8 0x7f98128374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f98128374db] |
| | | 9 0x7f981281b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f981281b182] |
| | | 10 0x7f981281b235 TRITONBACKEND_ModelInstanceInitialize + 101 |
| | | 11 0x7f989ab9aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f989ab9aa86] |
| | | 12 0x7f989ab9bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f989ab9bcc6] |
| | | 13 0x7f989ab7ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f989ab7ec15] |
| | | 14 0x7f989ab7f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f989ab7f256] |
| | | 15 0x7f989ab8b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f989ab8b27d] |
| | | 16 0x7f989a1f9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f989a1f9ee8] |
| | | 17 0x7f989ab7597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f989ab7597b] |
| | | 18 0x7f989ab85695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f989ab85695] |
| | | 19 0x7f989ab8a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f989ab8a50b] |
| | | 20 0x7f989ac73610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f989ac73610] |
| | | 21 0x7f989ac76d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f989ac76d03] |
| | | 22 0x7f989adc38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f989adc38b2] |
| | | 23 0x7f989a464253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f989a464253] |
| | | 24 0x7f989a1f4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f989a1f4ac3] |
| | | 25 0x7f989a285bf4 clone + 68 |
+----------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0305 19:29:57.082829 119 metrics.cc:817] Collecting metrics for GPU 0: GRID A100DX-80C
I0305 19:29:57.082854 119 metrics.cc:817] Collecting metrics for GPU 1: GRID A100DX-80C
I0305 19:29:57.082861 119 metrics.cc:817] Collecting metrics for GPU 2: GRID A100DX-80C
I0305 19:29:57.082865 119 metrics.cc:817] Collecting metrics for GPU 3: GRID A100DX-80C
I0305 19:29:57.083066 119 metrics.cc:710] Collecting CPU metrics
I0305 19:29:57.083220 119 tritonserver.cc:2458]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_d |
| | ata parameters statistics trace logging |
| model_repository_path[0] | /all_models/inflight_batcher_llm |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0305 19:29:57.083232 119 server.cc:293] Waiting for in-flight requests to complete.
I0305 19:29:57.083237 119 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I0305 19:29:57.083326 119 server.cc:324] All models are stopped, unloading models
I0305 19:29:57.083333 119 server.cc:331] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 4
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 64
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 1
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 4
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 64
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 1
E0305 19:29:57.373556 116 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1 0x7f17be81e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f17be81e045]
2 0x7f17be873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f17be873925]
3 0x7f17be8747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f17be8747bf]
4 0x7f17be8d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f17be8d5545]
5 0x7f17be84ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f17be84ef4e]
6 0x7f17be83ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f17be83ec0c]
7 0x7f17be8395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f17be8395f5]
8 0x7f17be8374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f17be8374db]
9 0x7f17be81b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f17be81b182]
10 0x7f17be81b235 TRITONBACKEND_ModelInstanceInitialize + 101
11 0x7f183559aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f183559aa86]
12 0x7f183559bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f183559bcc6]
13 0x7f183557ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f183557ec15]
14 0x7f183557f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f183557f256]
15 0x7f183558b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f183558b27d]
16 0x7f1834bf9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f1834bf9ee8]
17 0x7f183557597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f183557597b]
18 0x7f1835585695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f1835585695]
19 0x7f183558a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f183558a50b]
20 0x7f1835673610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f1835673610]
21 0x7f1835676d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f1835676d03]
22 0x7f18357c38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f18357c38b2]
23 0x7f1834e64253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f1834e64253]
24 0x7f1834bf4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f1834bf4ac3]
25 0x7f1834c85bf4 clone + 68
E0305 19:29:57.373760 116 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 1: Internal: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1 0x7f17be81e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f17be81e045]
2 0x7f17be873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f17be873925]
3 0x7f17be8747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f17be8747bf]
4 0x7f17be8d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f17be8d5545]
5 0x7f17be84ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f17be84ef4e]
6 0x7f17be83ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f17be83ec0c]
7 0x7f17be8395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f17be8395f5]
8 0x7f17be8374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f17be8374db]
9 0x7f17be81b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f17be81b182]
10 0x7f17be81b235 TRITONBACKEND_ModelInstanceInitialize + 101
11 0x7f183559aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f183559aa86]
12 0x7f183559bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f183559bcc6]
13 0x7f183557ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f183557ec15]
14 0x7f183557f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f183557f256]
15 0x7f183558b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f183558b27d]
16 0x7f1834bf9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f1834bf9ee8]
17 0x7f183557597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f183557597b]
18 0x7f1835585695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f1835585695]
19 0x7f183558a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f183558a50b]
20 0x7f1835673610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f1835673610]
21 0x7f1835676d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f1835676d03]
22 0x7f18357c38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f18357c38b2]
23 0x7f1834e64253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f1834e64253]
24 0x7f1834bf4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f1834bf4ac3]
25 0x7f1834c85bf4 clone + 68
I0305 19:29:57.373813 116 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
E0305 19:29:57.373955 116 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unexpected error when creating modelInstanceState: [TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not supported (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/bufferManager.cpp:170)
1 0x7f17be81e045 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x36045) [0x7f17be81e045]
2 0x7f17be873925 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8b925) [0x7f17be873925]
3 0x7f17be8747bf /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x8c7bf) [0x7f17be8747bf]
4 0x7f17be8d5545 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0xed545) [0x7f17be8d5545]
5 0x7f17be84ef4e /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x66f4e) [0x7f17be84ef4e]
6 0x7f17be83ec0c /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x56c0c) [0x7f17be83ec0c]
7 0x7f17be8395f5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x515f5) [0x7f17be8395f5]
8 0x7f17be8374db /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x4f4db) [0x7f17be8374db]
9 0x7f17be81b182 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x33182) [0x7f17be81b182]
10 0x7f17be81b235 TRITONBACKEND_ModelInstanceInitialize + 101
11 0x7f183559aa86 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a4a86) [0x7f183559aa86]
12 0x7f183559bcc6 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1a5cc6) [0x7f183559bcc6]
13 0x7f183557ec15 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x188c15) [0x7f183557ec15]
14 0x7f183557f256 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x189256) [0x7f183557f256]
15 0x7f183558b27d /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19527d) [0x7f183558b27d]
16 0x7f1834bf9ee8 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8) [0x7f1834bf9ee8]
17 0x7f183557597b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x17f97b) [0x7f183557597b]
18 0x7f1835585695 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18f695) [0x7f1835585695]
19 0x7f183558a50b /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19450b) [0x7f183558a50b]
20 0x7f1835673610 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x27d610) [0x7f1835673610]
21 0x7f1835676d03 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x280d03) [0x7f1835676d03]
22 0x7f18357c38b2 /opt/tritonserver/bin/../lib/libtritonserver.so(+0x3cd8b2) [0x7f18357c38b2]
23 0x7f1834e64253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f1834e64253]
24 0x7f1834bf4ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f1834bf4ac3]
25 0x7f1834c85bf4 clone + 68;
I0305 19:29:57.374069 116 server.cc:592]
...
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_d |
| | ata parameters statistics trace logging |
| model_repository_path[0] | /all_models/inflight_batcher_llm |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0305 19:29:58.617065 118 server.cc:293] Waiting for in-flight requests to complete.
I0305 19:29:58.617072 118 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I0305 19:29:58.617154 118 server.cc:324] All models are stopped, unloading models
I0305 19:29:58.617163 118 server.cc:331] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0305 19:29:58.639565 119 model_lifecycle.cc:603] successfully unloaded 'preprocessing' version 1
I0305 19:29:58.732184 116 model_lifecycle.cc:603] successfully unloaded 'postprocessing' version 1
I0305 19:29:58.736228 117 model_lifecycle.cc:603] successfully unloaded 'postprocessing' version 1
I0305 19:29:58.915008 117 model_lifecycle.cc:603] successfully unloaded 'preprocessing' version 1
I0305 19:29:58.922722 116 model_lifecycle.cc:603] successfully unloaded 'preprocessing' version 1
I0305 19:29:59.083530 119 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
W0305 19:29:59.089428 119 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.089462 119 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.089469 119 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:29:59.089479 119 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.089485 119 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.089490 119 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:29:59.089497 119 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.089502 119 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.089506 119 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:29:59.089513 119 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.089518 119 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.089523 119 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
error: creating server: Internal - failed to load all models
I0305 19:29:59.427537 116 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
W0305 19:29:59.433040 116 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.433087 116 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.433095 116 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:29:59.433105 116 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.433115 116 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.433121 116 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:29:59.433129 116 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.433135 116 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.433141 116 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:29:59.433148 116 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.433154 116 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.433161 116 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
I0305 19:29:59.470397 117 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
error: creating server: Internal - failed to load all models
W0305 19:29:59.524316 117 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.524353 117 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.524357 117 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:29:59.524364 117 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.524369 117 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.524372 117 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:29:59.524378 117 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.524380 117 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.524383 117 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:29:59.524390 117 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.524394 117 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.524396 117 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
I0305 19:29:59.617248 118 server.cc:331] Timeout 29: Found 2 live models and 0 in-flight non-inference requests
Cleaning up...
Cleaning up...
W0305 19:29:59.625442 118 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.625467 118 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:29:59.625472 118 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:29:59.625481 118 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.625485 118 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:29:59.625488 118 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:29:59.625492 118 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.625496 118 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:29:59.625499 118 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:29:59.625503 118 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.625506 118 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:29:59.625509 118 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
I0305 19:29:59.954424 118 model_lifecycle.cc:603] successfully unloaded 'postprocessing' version 1
W0305 19:30:00.094796 119 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.094820 119 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.094824 119 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:30:00.094829 119 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.094832 119 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.094834 119 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:30:00.094840 119 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.094843 119 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.094846 119 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:30:00.094849 119 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.094852 119 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.094855 119 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
I0305 19:30:00.368031 118 model_lifecycle.cc:603] successfully unloaded 'preprocessing' version 1
W0305 19:30:00.438892 116 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.438937 116 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.438941 116 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:30:00.438946 116 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.438949 116 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.438953 116 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:30:00.438959 116 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.438965 116 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.438968 116 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:30:00.438973 116 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.438976 116 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.438979 116 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
W0305 19:30:00.529680 117 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.529750 117 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.529754 117 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:30:00.529762 117 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.529765 117 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.529769 117 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:30:00.529774 117 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.529778 117 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.529785 117 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:30:00.529790 117 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.529792 117 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.529795 117 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
I0305 19:30:00.617350 118 server.cc:331] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
W0305 19:30:00.625876 118 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.625898 118 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:30:00.625902 118 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:30:00.625906 118 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.625908 118 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:30:00.625910 118 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:30:00.625915 118 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.625918 118 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:30:00.625921 118 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:30:00.625930 118 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.625932 118 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:30:00.625936 118 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
error: creating server: Internal - failed to load all models
W0305 19:30:01.631565 118 metrics.cc:582] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W0305 19:30:01.631621 118 metrics.cc:600] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W0305 19:30:01.631625 118 metrics.cc:624] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0305 19:30:01.631632 118 metrics.cc:582] Unable to get power limit for GPU 1. Status:Success, value:0.000000
W0305 19:30:01.631635 118 metrics.cc:600] Unable to get power usage for GPU 1. Status:Success, value:0.000000
W0305 19:30:01.631639 118 metrics.cc:624] Unable to get energy consumption for GPU 1. Status:Success, value:0
W0305 19:30:01.631643 118 metrics.cc:582] Unable to get power limit for GPU 2. Status:Success, value:0.000000
W0305 19:30:01.631647 118 metrics.cc:600] Unable to get power usage for GPU 2. Status:Success, value:0.000000
W0305 19:30:01.631649 118 metrics.cc:624] Unable to get energy consumption for GPU 2. Status:Success, value:0
W0305 19:30:01.631653 118 metrics.cc:582] Unable to get power limit for GPU 3. Status:Success, value:0.000000
W0305 19:30:01.631657 118 metrics.cc:600] Unable to get power usage for GPU 3. Status:Success, value:0.000000
W0305 19:30:01.631660 118 metrics.cc:624] Unable to get energy consumption for GPU 3. Status:Success, value:0
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[20086,1],3]
Exit code: 1
--------------------------------------------------------------------------
Running into the exact same issue
In our case it was solved by setting vGPU plugin parameters in VMware: https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-vmware-vsphere see also here: https://kb.vmware.com/s/article/2142307
We're seeing this on Azure with A10 GPUs. Does MIG prevent using this cuda function?
Facing similar issues as well, using L40 vGPU (l40_48c profile) running on a VMware ubuntu 22.04 VM.
In our case, it was solved as well by setting the right VM params: https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-vmware-vsphere
In our case, it was solved as well by setting the right VM params: https://docs.nvidia.com/grid/13.0/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-vmware-vsphere
May I know what VM params did you change to fix this? I've added pciPassthru.use64bitMMIO to TRUE and pciPassthru.64bitMMIOSizeGB to 128 and it still didn't work.
Same issue on A40 using vGPU on ESXI 7. Can someone let us know which parameter can fix it? The 2 mentioned by @yxchia98 are not enough. @vhojan @tobernat
I guess you need to set enable_uvm to 1? (edit: worked for me)
Setting enable_uvm to 1 wasn't sufficient in our case. Does anyone have another solution?
Setting enable_uvm to 1 wasn't sufficient in my case too.