server
server copied to clipboard
The model instance placement on GPU seems incorrect?
Description It seems that when deploying with Triton, the model execution is not correctly assigned to the specified GPU.
Triton Information nvcr.io/nvidia/tritonserver:25.03-py3
To Reproduce config.pbtxt
backend: "python"
input [
{
name: "prompt"
data_type: TYPE_STRING
dims: [1]
}
]
output [
{
name: "response"
data_type: TYPE_STRING
dims: [1]
}
]
instance_group [
{
kind: KIND_GPU
count: 1
gpus: [1]
}
]
model.py
# model_repository/animaginexl/1/model.py
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import numpy as np
from diffusers import DiffusionPipeline
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
def initialize(self, args):
model_path = os.path.join(args['model_repository'], args['model_version'], 'animaginexl')
self.pipe = DiffusionPipeline.from_pretrained(model_path, torch_dtype="float16").to("cuda")
self.pipe.set_progress_bar_config(disable=True)
def execute(self, requests):
responses = []
for request in requests:
prompt_tensor = pb_utils.get_input_tensor_by_name(request, "prompt")
prompt = prompt_tensor.as_numpy().item().decode("utf-8")
image = self.pipe(prompt).images[0]
img_np = np.array(image).astype(np.uint8)
out_tensor = pb_utils.Tensor("generated_image", img_np)
inference_response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
responses.append(inference_response)
return responses
terminal output
I0505 06:10:39.222908 1872 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f6dcc000000' with size 268435456"
I0505 06:10:39.249076 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0505 06:10:39.249090 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 1 with size 67108864"
I0505 06:10:39.249098 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 2 with size 67108864"
I0505 06:10:39.249103 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 3 with size 67108864"
I0505 06:10:39.249109 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 4 with size 67108864"
I0505 06:10:39.249114 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 5 with size 67108864"
I0505 06:10:39.249121 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 6 with size 67108864"
I0505 06:10:39.249126 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 7 with size 67108864"
I0505 06:10:40.352171 1872 model_lifecycle.cc:473] "loading: animaginexl:1"
I0505 06:10:40.352206 1872 model_lifecycle.cc:473] "loading: orpheus:1"
I0505 06:10:44.550052 1872 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: orpheus_0_0 (GPU device 1)"
I0505 06:10:46.034864 1872 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: animaginexl_0_0 (GPU device 0)"
Passed `torch_dtype` torch.float32 is not a `torch.dtype`. Defaulting to `torch.float32`.
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 4.03it/s]
I0505 06:10:53.637473 1872 model_lifecycle.cc:849] "successfully loaded 'animaginexl'"
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.64it/s]
I0505 06:10:58.887751 1872 model_lifecycle.cc:849] "successfully loaded 'orpheus'"
I0505 06:10:58.887944 1872 server.cc:604]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0505 06:10:58.888073 1872 server.cc:631]
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/bac |
| | | kends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------+
I0505 06:10:58.888210 1872 server.cc:674]
+-------------+---------+--------+
| Model | Version | Status |
+-------------+---------+--------+
| animaginexl | 1 | READY |
| orpheus | 1 | READY |
+-------------+---------+--------+
I0505 06:10:59.123453 1872 metrics.cc:890] "Collecting metrics for GPU 0: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123490 1872 metrics.cc:890] "Collecting metrics for GPU 1: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123499 1872 metrics.cc:890] "Collecting metrics for GPU 2: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123507 1872 metrics.cc:890] "Collecting metrics for GPU 3: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123515 1872 metrics.cc:890] "Collecting metrics for GPU 4: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123523 1872 metrics.cc:890] "Collecting metrics for GPU 5: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123531 1872 metrics.cc:890] "Collecting metrics for GPU 6: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123538 1872 metrics.cc:890] "Collecting metrics for GPU 7: NVIDIA A100 80GB PCIe"
I0505 06:10:59.177136 1872 metrics.cc:783] "Collecting CPU metrics"
I0505 06:10:59.177254 1872 tritonserver.cc:2598]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.56.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration s |
| | ystem_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| model_config_name | |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| cuda_memory_pool_byte_size{2} | 67108864 |
| cuda_memory_pool_byte_size{3} | 67108864 |
| cuda_memory_pool_byte_size{4} | 67108864 |
| cuda_memory_pool_byte_size{5} | 67108864 |
| cuda_memory_pool_byte_size{6} | 67108864 |
| cuda_memory_pool_byte_size{7} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------+
I0505 06:10:59.180173 1872 grpc_server.cc:2560] "Started GRPCInferenceService at 0.0.0.0:8001"
I0505 06:10:59.180422 1872 http_server.cc:4755] "Started HTTPService at 0.0.0.0:8000"
I0505 06:10:59.221201 1872 http_server.cc:358] "Started Metrics Service at 0.0.0.0:8002"
Expected behavior It seems that a model is running on every GPU
yeleyi@node06:/data1/Aquila$ nvidia-smi
Mon May 5 14:05:59 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100 80GB PCIe Off | 00000000:67:00.0 Off | 0 |
| N/A 47C P0 67W / 300W | 14988MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe Off | 00000000:68:00.0 Off | 0 |
| N/A 48C P0 71W / 300W | 2746MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100 80GB PCIe Off | 00000000:6C:00.0 Off | 0 |
| N/A 46C P0 67W / 300W | 1870MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100 80GB PCIe Off | 00000000:6D:00.0 Off | 0 |
| N/A 46C P0 63W / 300W | 1870MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100 80GB PCIe Off | 00000000:E5:00.0 Off | 0 |
| N/A 48C P0 74W / 300W | 1870MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100 80GB PCIe Off | 00000000:E6:00.0 Off | 0 |
| N/A 47C P0 67W / 300W | 1870MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100 80GB PCIe Off | 00000000:E7:00.0 Off | 0 |
| N/A 45C P0 67W / 300W | 1870MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100 80GB PCIe Off | 00000000:E8:00.0 Off | 0 |
| N/A 48C P0 69W / 300W | 1486MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3291444 C tritonserver 478MiB |
| 0 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 414MiB |
| 0 N/A N/A 3291794 C ...s/python/triton_python_backend_stub 14076MiB |
| 1 N/A N/A 3291444 C tritonserver 478MiB |
| 1 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 2254MiB |
| 2 N/A N/A 3291444 C tritonserver 478MiB |
| 2 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 1378MiB |
| 3 N/A N/A 3291444 C tritonserver 478MiB |
| 3 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 1378MiB |
| 4 N/A N/A 3291444 C tritonserver 478MiB |
| 4 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 1378MiB |
| 5 N/A N/A 3291444 C tritonserver 478MiB |
| 5 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 1378MiB |
| 6 N/A N/A 3291444 C tritonserver 478MiB |
| 6 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 1378MiB |
| 7 N/A N/A 3291444 C tritonserver 478MiB |
| 7 N/A N/A 3291693 C ...s/python/triton_python_backend_stub 994MiB |
+---------------------------------------------------------------------------------------+