The model instance placement on GPU seems incorrect?

Open Ind1x1 opened this issue 7 months ago • 0 comments

Description It seems that when deploying with Triton, the model execution is not correctly assigned to the specified GPU.

Triton Information nvcr.io/nvidia/tritonserver:25.03-py3

To Reproduce config.pbtxt

backend: "python"

input [
  {
    name: "prompt"
    data_type: TYPE_STRING
    dims: [1]
  }
]

output [
  {
    name: "response"
    data_type: TYPE_STRING
    dims: [1]
  }
]

instance_group [
  {
    kind: KIND_GPU
    count: 1
    gpus: [1]
  }
]

model.py

# model_repository/animaginexl/1/model.py
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  
import numpy as np
from diffusers import DiffusionPipeline
import triton_python_backend_utils as pb_utils

class TritonPythonModel:
    def initialize(self, args):
        model_path = os.path.join(args['model_repository'], args['model_version'], 'animaginexl')
        self.pipe = DiffusionPipeline.from_pretrained(model_path, torch_dtype="float16").to("cuda")
        self.pipe.set_progress_bar_config(disable=True)

    def execute(self, requests):
        responses = []
        for request in requests:
            prompt_tensor = pb_utils.get_input_tensor_by_name(request, "prompt")
            prompt = prompt_tensor.as_numpy().item().decode("utf-8")
            image = self.pipe(prompt).images[0]
            img_np = np.array(image).astype(np.uint8)
            out_tensor = pb_utils.Tensor("generated_image", img_np)
            inference_response = pb_utils.InferenceResponse(output_tensors=[out_tensor])
            responses.append(inference_response)
        return responses

terminal output

I0505 06:10:39.222908 1872 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f6dcc000000' with size 268435456"
I0505 06:10:39.249076 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0505 06:10:39.249090 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 1 with size 67108864"
I0505 06:10:39.249098 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 2 with size 67108864"
I0505 06:10:39.249103 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 3 with size 67108864"
I0505 06:10:39.249109 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 4 with size 67108864"
I0505 06:10:39.249114 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 5 with size 67108864"
I0505 06:10:39.249121 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 6 with size 67108864"
I0505 06:10:39.249126 1872 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 7 with size 67108864"
I0505 06:10:40.352171 1872 model_lifecycle.cc:473] "loading: animaginexl:1"
I0505 06:10:40.352206 1872 model_lifecycle.cc:473] "loading: orpheus:1"
I0505 06:10:44.550052 1872 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: orpheus_0_0 (GPU device 1)"
I0505 06:10:46.034864 1872 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: animaginexl_0_0 (GPU device 0)"
Passed `torch_dtype` torch.float32 is not a `torch.dtype`. Defaulting to `torch.float32`.
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.03it/s]
I0505 06:10:53.637473 1872 model_lifecycle.cc:849] "successfully loaded 'animaginexl'"
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.64it/s]
I0505 06:10:58.887751 1872 model_lifecycle.cc:849] "successfully loaded 'orpheus'"
I0505 06:10:58.887944 1872 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0505 06:10:58.888073 1872 server.cc:631] 
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                               |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/bac |
|         |                                                       | kends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}            |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------+

I0505 06:10:58.888210 1872 server.cc:674] 
+-------------+---------+--------+
| Model       | Version | Status |
+-------------+---------+--------+
| animaginexl | 1       | READY  |
| orpheus     | 1       | READY  |
+-------------+---------+--------+

I0505 06:10:59.123453 1872 metrics.cc:890] "Collecting metrics for GPU 0: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123490 1872 metrics.cc:890] "Collecting metrics for GPU 1: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123499 1872 metrics.cc:890] "Collecting metrics for GPU 2: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123507 1872 metrics.cc:890] "Collecting metrics for GPU 3: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123515 1872 metrics.cc:890] "Collecting metrics for GPU 4: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123523 1872 metrics.cc:890] "Collecting metrics for GPU 5: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123531 1872 metrics.cc:890] "Collecting metrics for GPU 6: NVIDIA A100 80GB PCIe"
I0505 06:10:59.123538 1872 metrics.cc:890] "Collecting metrics for GPU 7: NVIDIA A100 80GB PCIe"
I0505 06:10:59.177136 1872 metrics.cc:783] "Collecting CPU metrics"
I0505 06:10:59.177254 1872 tritonserver.cc:2598] 
+----------------------------------+--------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                              |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                             |
| server_version                   | 2.56.0                                                                                                             |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration s |
|                                  | ystem_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging                      |
| model_repository_path[0]         | /models                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                          |
| strict_model_config              | 0                                                                                                                  |
| model_config_name                |                                                                                                                    |
| rate_limit                       | OFF                                                                                                                |
| pinned_memory_pool_byte_size     | 268435456                                                                                                          |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                           |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                           |
| cuda_memory_pool_byte_size{2}    | 67108864                                                                                                           |
| cuda_memory_pool_byte_size{3}    | 67108864                                                                                                           |
| cuda_memory_pool_byte_size{4}    | 67108864                                                                                                           |
| cuda_memory_pool_byte_size{5}    | 67108864                                                                                                           |
| cuda_memory_pool_byte_size{6}    | 67108864                                                                                                           |
| cuda_memory_pool_byte_size{7}    | 67108864                                                                                                           |
| min_supported_compute_capability | 6.0                                                                                                                |
| strict_readiness                 | 1                                                                                                                  |
| exit_timeout                     | 30                                                                                                                 |
| cache_enabled                    | 0                                                                                                                  |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------+

I0505 06:10:59.180173 1872 grpc_server.cc:2560] "Started GRPCInferenceService at 0.0.0.0:8001"
I0505 06:10:59.180422 1872 http_server.cc:4755] "Started HTTPService at 0.0.0.0:8000"
I0505 06:10:59.221201 1872 http_server.cc:358] "Started Metrics Service at 0.0.0.0:8002"

Expected behavior It seems that a model is running on every GPU

yeleyi@node06:/data1/Aquila$ nvidia-smi
Mon May  5 14:05:59 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:67:00.0 Off |                    0 |
| N/A   47C    P0              67W / 300W |  14988MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off | 00000000:68:00.0 Off |                    0 |
| N/A   48C    P0              71W / 300W |   2746MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100 80GB PCIe          Off | 00000000:6C:00.0 Off |                    0 |
| N/A   46C    P0              67W / 300W |   1870MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100 80GB PCIe          Off | 00000000:6D:00.0 Off |                    0 |
| N/A   46C    P0              63W / 300W |   1870MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100 80GB PCIe          Off | 00000000:E5:00.0 Off |                    0 |
| N/A   48C    P0              74W / 300W |   1870MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100 80GB PCIe          Off | 00000000:E6:00.0 Off |                    0 |
| N/A   47C    P0              67W / 300W |   1870MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100 80GB PCIe          Off | 00000000:E7:00.0 Off |                    0 |
| N/A   45C    P0              67W / 300W |   1870MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100 80GB PCIe          Off | 00000000:E8:00.0 Off |                    0 |
| N/A   48C    P0              69W / 300W |   1486MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    0   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub      414MiB |
|    0   N/A  N/A   3291794      C   ...s/python/triton_python_backend_stub    14076MiB |
|    1   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    1   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub     2254MiB |
|    2   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    2   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub     1378MiB |
|    3   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    3   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub     1378MiB |
|    4   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    4   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub     1378MiB |
|    5   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    5   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub     1378MiB |
|    6   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    6   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub     1378MiB |
|    7   N/A  N/A   3291444      C   tritonserver                                478MiB |
|    7   N/A  N/A   3291693      C   ...s/python/triton_python_backend_stub      994MiB |
+---------------------------------------------------------------------------------------+

May 05 '25 06:05 Ind1x1