server icon indicating copy to clipboard operation
server copied to clipboard

Dynamically loaded models don't work with ensemble

Open fran6co opened this issue 2 years ago • 8 comments

Description When using dynamically loaded models via the load model API, the ensemble will not pick them up.

I0712 13:09:16.608657 1 model_repository_manager.cc:843] AsyncUnload() 'resize'
I0712 13:09:16.608668 1 model_repository_manager.cc:1136] TriggerNextAction() 'resize' version 1: 2
I0712 13:09:16.608673 1 model_repository_manager.cc:1216] Unload() 'resize' version 1
I0712 13:09:16.608674 1 model_repository_manager.cc:1223] unloading: resize:1
I0712 13:09:16.608698 1 model_repository_manager.cc:843] AsyncUnload() 'test'
E0712 13:09:16.608702 1 model_repository_manager.cc:1551] Invalid argument: ensemble test contains models that are not available: resize
I0712 13:09:16.608707 1 model_repository_manager.cc:713] VersionStates() 'resize'

Triton Information

docker container 22.06-py3

To Reproduce

file:1/dali,py

import nvidia.dali as dali
import nvidia.dali.types as types
from nvidia.dali.plugin.triton import autoserialize


@autoserialize
@dali.pipeline_def(batch_size=3, num_threads=1, device_id=0)
def pipe():
    images = dali.fn.external_source(device="gpu", name="DALI_INPUT_0")
    images = dali.fn.resize(images, resize_x=1280, dtype=types.FLOAT)
    return images

resize/config.json

{
  "name": "resize",
  "version_policy": {
    "latest": {
      "num_versions": 1
    }
  },
  "max_batch_size": 256,
  "input": [
    {
      "name": "DALI_INPUT_0",
      "data_type": "TYPE_UINT8",
      "dims": [
        "-1",
        "-1",
        "3"
      ]
    }
  ],
  "output": [
    {
      "name": "DALI_OUTPUT_0",
      "data_type": "TYPE_FP32",
      "dims": [
        "-1",
        "-1",
        "3"
      ]
    }
  ],
  "model_warmup": [
    {}
  ],
  "backend": "dali"
}

ensemble/config.json

{
  "name": "test",
  "platform": "ensemble",
  "version_policy": {
    "latest": {
      "num_versions": 1
    }
  },
  "input": [
    {
      "name": "images",
      "data_type": "TYPE_UINT8",
      "dims": [
        "1",
        "-1",
        "-1",
        "3"
      ]
    }
  ],
  "output": [
    {
      "name": "resized",
      "data_type": "TYPE_FP32",
      "dims": [
        "1",
        "-1",
        "-1",
        "3"
      ]
    }
  ],
  "ensemble_scheduling": {
    "step": [
      {
        "model_name": "resize",
        "model_version": "-1",
        "input_map": {
          "DALI_INPUT_0": "images"
        },
        "output_map": {
          "DALI_OUTPUT_0": "resized"
        }
      }
    ]
  }
}

Using:

client.load_model("resize", config=<resize.json>, files={"file:1/daly.py": <daly.py>})
client.load_model("test", config=<ensemble.json>, files={"file:1/trigger": b""})

Expected behavior To just load the ensemble model as a version of resize was loaded already

fran6co avatar Jul 12 '22 13:07 fran6co

Hi @fran6co, could you also provide the full command you are using to run tritonserver? @GuanLuo Do you see anything which could help here?

krishung5 avatar Jul 14 '22 16:07 krishung5

docker run --gpus all --rm -p8000:8000 -p8001:8001 -p8002:8002 --ipc=host -v $HOME/.cache:/root/.cache nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/tmp --model-control-mode=explicit --strict-model-config=false

fran6co avatar Jul 14 '22 16:07 fran6co

Can you share the complete log as well? Was "resize" successfully loaded?

GuanLuo avatar Jul 14 '22 18:07 GuanLuo

Yes, "resize" was successfully loaded. I think I changed the logging and it's showing some more info:

I0715 07:37:27.708603 1 model_repository_manager.cc:1191] loading: resize:1
I0715 07:37:27.808864 1 dali_backend.cc:119] TRITONBACKEND_ModelInitialize: resize (version 1)
I0715 07:37:27.808877 1 dali_backend.cc:131] Repository location: /tmp/folderqufRYh
I0715 07:37:27.808879 1 dali_backend.cc:142] backend state is 'backend state'
I0715 07:37:27.809346 1 dali_model.h:151] DALI pipeline from file /tmp/folderqufRYh/1/model.dali loaded successfully.
I0715 07:37:27.810073 1 dali_backend.cc:190] TRITONBACKEND_ModelInstanceInitialize: resize (GPU device 0)
I0715 07:37:27.810614 1 model_repository_manager.cc:1345] successfully loaded 'resize' version 1
E0715 07:37:27.810673 1 model_repository_manager.cc:1571] failed to load model 'test': failed to open directory /tmp/folderV5JZli
I0715 07:37:27.813276 1 model_repository_manager.cc:1223] unloading: resize:1
E0715 07:37:27.813300 1 model_repository_manager.cc:860] Agent model returns error on TRITONREPOAGENT_ACTION_UNLOAD: Internal: Unexpected lifecycle state transition from TRITONREPOAGENT_ACTION_LOAD_FAIL to TRITONREPOAGENT_ACTION_UNLOAD
E0715 07:37:27.813307 1 model_repository_manager.cc:868] Agent model returns error on TRITONREPOAGENT_ACTION_UNLOAD_COMPLETE: Internal: Unexpected lifecycle state transition from TRITONREPOAGENT_ACTION_LOAD_FAIL to TRITONREPOAGENT_ACTION_UNLOAD_COMPLETE
E0715 07:37:27.813310 1 model_repository_manager.cc:1551] Unavailable: Request for unknown model: 'resize' has no available versions
I0715 07:37:27.813513 1 dali_backend.cc:223] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0715 07:37:27.813694 1 dali_backend.cc:169] TRITONBACKEND_ModelFinalize: delete model state
I0715 07:37:27.813713 1 model_repository_manager.cc:1328] successfully unloaded 'resize' version 1

fran6co avatar Jul 15 '22 07:07 fran6co

From the log: failed to load model 'test'. This could have caused the 'resize' model to be unloaded. Removing the 'test' model from the model repository might solve this issue. CC @GuanLuo

kthui avatar Jul 19 '22 18:07 kthui

@kthui the test model is the ensemble one that is referencing the resize one. I tried the same configuration using the model repository folder and it works just fine

fran6co avatar Jul 19 '22 21:07 fran6co

E0715 07:37:27.810673 1 model_repository_manager.cc:1571] failed to load model 'test': failed to open directory /tmp/folderV5JZli

Seems like there is something wrong when setting up the ensemble model directory, will investigate

GuanLuo avatar Jul 20 '22 18:07 GuanLuo

Filed a ticket for us to investigate further.

kthui avatar Jul 20 '22 18:07 kthui