server
server copied to clipboard
Dynamically loaded models don't work with ensemble
Description When using dynamically loaded models via the load model API, the ensemble will not pick them up.
I0712 13:09:16.608657 1 model_repository_manager.cc:843] AsyncUnload() 'resize'
I0712 13:09:16.608668 1 model_repository_manager.cc:1136] TriggerNextAction() 'resize' version 1: 2
I0712 13:09:16.608673 1 model_repository_manager.cc:1216] Unload() 'resize' version 1
I0712 13:09:16.608674 1 model_repository_manager.cc:1223] unloading: resize:1
I0712 13:09:16.608698 1 model_repository_manager.cc:843] AsyncUnload() 'test'
E0712 13:09:16.608702 1 model_repository_manager.cc:1551] Invalid argument: ensemble test contains models that are not available: resize
I0712 13:09:16.608707 1 model_repository_manager.cc:713] VersionStates() 'resize'
Triton Information
docker container 22.06-py3
To Reproduce
file:1/dali,py
import nvidia.dali as dali
import nvidia.dali.types as types
from nvidia.dali.plugin.triton import autoserialize
@autoserialize
@dali.pipeline_def(batch_size=3, num_threads=1, device_id=0)
def pipe():
images = dali.fn.external_source(device="gpu", name="DALI_INPUT_0")
images = dali.fn.resize(images, resize_x=1280, dtype=types.FLOAT)
return images
resize/config.json
{
"name": "resize",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 256,
"input": [
{
"name": "DALI_INPUT_0",
"data_type": "TYPE_UINT8",
"dims": [
"-1",
"-1",
"3"
]
}
],
"output": [
{
"name": "DALI_OUTPUT_0",
"data_type": "TYPE_FP32",
"dims": [
"-1",
"-1",
"3"
]
}
],
"model_warmup": [
{}
],
"backend": "dali"
}
ensemble/config.json
{
"name": "test",
"platform": "ensemble",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"input": [
{
"name": "images",
"data_type": "TYPE_UINT8",
"dims": [
"1",
"-1",
"-1",
"3"
]
}
],
"output": [
{
"name": "resized",
"data_type": "TYPE_FP32",
"dims": [
"1",
"-1",
"-1",
"3"
]
}
],
"ensemble_scheduling": {
"step": [
{
"model_name": "resize",
"model_version": "-1",
"input_map": {
"DALI_INPUT_0": "images"
},
"output_map": {
"DALI_OUTPUT_0": "resized"
}
}
]
}
}
Using:
client.load_model("resize", config=<resize.json>, files={"file:1/daly.py": <daly.py>})
client.load_model("test", config=<ensemble.json>, files={"file:1/trigger": b""})
Expected behavior To just load the ensemble model as a version of resize was loaded already
Hi @fran6co, could you also provide the full command you are using to run tritonserver? @GuanLuo Do you see anything which could help here?
docker run --gpus all --rm -p8000:8000 -p8001:8001 -p8002:8002 --ipc=host -v $HOME/.cache:/root/.cache nvcr.io/nvidia/tritonserver:22.06-py3 tritonserver --model-repository=/tmp --model-control-mode=explicit --strict-model-config=false
Can you share the complete log as well? Was "resize" successfully loaded?
Yes, "resize" was successfully loaded. I think I changed the logging and it's showing some more info:
I0715 07:37:27.708603 1 model_repository_manager.cc:1191] loading: resize:1
I0715 07:37:27.808864 1 dali_backend.cc:119] TRITONBACKEND_ModelInitialize: resize (version 1)
I0715 07:37:27.808877 1 dali_backend.cc:131] Repository location: /tmp/folderqufRYh
I0715 07:37:27.808879 1 dali_backend.cc:142] backend state is 'backend state'
I0715 07:37:27.809346 1 dali_model.h:151] DALI pipeline from file /tmp/folderqufRYh/1/model.dali loaded successfully.
I0715 07:37:27.810073 1 dali_backend.cc:190] TRITONBACKEND_ModelInstanceInitialize: resize (GPU device 0)
I0715 07:37:27.810614 1 model_repository_manager.cc:1345] successfully loaded 'resize' version 1
E0715 07:37:27.810673 1 model_repository_manager.cc:1571] failed to load model 'test': failed to open directory /tmp/folderV5JZli
I0715 07:37:27.813276 1 model_repository_manager.cc:1223] unloading: resize:1
E0715 07:37:27.813300 1 model_repository_manager.cc:860] Agent model returns error on TRITONREPOAGENT_ACTION_UNLOAD: Internal: Unexpected lifecycle state transition from TRITONREPOAGENT_ACTION_LOAD_FAIL to TRITONREPOAGENT_ACTION_UNLOAD
E0715 07:37:27.813307 1 model_repository_manager.cc:868] Agent model returns error on TRITONREPOAGENT_ACTION_UNLOAD_COMPLETE: Internal: Unexpected lifecycle state transition from TRITONREPOAGENT_ACTION_LOAD_FAIL to TRITONREPOAGENT_ACTION_UNLOAD_COMPLETE
E0715 07:37:27.813310 1 model_repository_manager.cc:1551] Unavailable: Request for unknown model: 'resize' has no available versions
I0715 07:37:27.813513 1 dali_backend.cc:223] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0715 07:37:27.813694 1 dali_backend.cc:169] TRITONBACKEND_ModelFinalize: delete model state
I0715 07:37:27.813713 1 model_repository_manager.cc:1328] successfully unloaded 'resize' version 1
From the log: failed to load model 'test'. This could have caused the 'resize' model to be unloaded. Removing the 'test' model from the model repository might solve this issue. CC @GuanLuo
@kthui the test model is the ensemble one that is referencing the resize one. I tried the same configuration using the model repository folder and it works just fine
E0715 07:37:27.810673 1 model_repository_manager.cc:1571] failed to load model 'test': failed to open directory /tmp/folderV5JZli
Seems like there is something wrong when setting up the ensemble model directory, will investigate
Filed a ticket for us to investigate further.