seldon-core
seldon-core copied to clipboard
Incompatibility of metadata access in multi-model Triton nodes
Describe the bug
In prepackaged multi-model Triton servers the model metadata endpoints in V2 protocol only one of the deployed models endpoints is accessible under v2/models/${MODEL_NAME}. This is because the name field under predictors can only accept one of the deployed models' names. For example I have deployed two (similar models) in the Seldon packaged Triton server and both models are loaded and successfully available for inferring under /infer endpoint. But for metadata, it is possible to supply only one of their names (onnx-gpt2-model1 in the provided example) therefore, only one of their metadata is exposed.
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: gpt2-multi
spec:
predictors:
- graph:
implementation: TRITON_SERVER
logger:
mode: all
modelUri: s3://language-models-multi
envSecretRefName: seldon-init-container-secret
name: onnx-gpt2-model1
type: MODEL
name: default
replicas: 1
protocol: kfserving
For the two successfully deployed models on Triton sever:
I0730 15:48:08.086312 1 server.cc:586]
+------------------+---------+--------+
| Model | Version | Status |
+------------------+---------+--------+
| onnx-gpt2-model1 | 1 | READY |
| onnx-gpt2-model2 | 1 | READY |
+------------------+---------+--------+
The onnx-gpt2-model1 metadata is accessible:
url -s http://localhost:32000/seldon/default/gpt2-multi/v2/models/onnx-gpt2-model1
{"name":"onnx-gpt2-model1","versions":["1"],"platform":"onnxruntime_onnx","inputs":[{"name":"input_ids","datatype":"INT32","shape":[-1,-1]},{"name":"attention_mask","datatype":"INT32","shape":[-1,-1]}],"outputs":[{"name":"past_key_values","datatype":"FP32","shape":[12,2,-1,12,-1,64]},{"name":"logits","datatype":"FP32","shape":[-1,-1,50257]}]}%
But the onnx-gpt2-model2 metadata is not:
curl -s http://localhost:32000/seldon/default/gpt2-multi/v2/models/onnx-gpt2-model2
{"status":{"code":500,"info":"Failed to find model onnx-gpt2-model2","status":"FAILURE"}}%
I think the name field should accept a list of names (of all models) instead of a single name or another option would be to generate the names automatically from the Triton servers.
To reproduce
Following the Pretrained GPT2 Model Deployment Example, just generated two similar models instead of one.
Expected behaviour
All models' metadata endpoints should be available.
Environment
- Cloud Provider: Bare Metal
- Kubernetes Cluster Version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-15T14:22:29Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.0-2+59bbb3530b6769", GitCommit:"59bbb3530b6769e4935a05ac0e13c9910c79253e", GitTreeState:"clean", BuildDate:"2022-05-13T06:41:13Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
- Deployed Seldon System Images:
value: docker.io/seldonio/seldon-core-executor:1.14.0
image: docker.io/seldonio/seldon-core-operator:1.14.0
I think this is related to #4240 and would be solved by clearer Model semantics which we are investigating.
Please test in V2