seldon-core
seldon-core copied to clipboard
Intermediate models metadata endpoints are not accessible in seldondeployments with separate pods
Describe the bug
I want to point a follow up on https://github.com/SeldonIO/seldon-core/issues/4092 which could be a potential bug. It seems that the intermediate models metadata endpoints are not accessible in an inference graph when the pods are deployed separately. I have made a three dummy nodes: 1. MODEL -> model-one 2. MODEL -> model-two and 3. COMBINER -> node-combiner with each node having some metadata passed by the init_metadata function. I have deployed the model in the two format:
- All containers in a single pod
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: combiner-single-pod
spec:
name: combiner-single-pod
predictors:
- componentSpecs:
- spec:
containers:
- image: sdghafouri/templates:combiner
name: node-combiner
imagePullPolicy: Always
- image: sdghafouri/templates:model1
name: node-one
imagePullPolicy: Always
- image: sdghafouri/templates:model2
name: node-two
imagePullPolicy: Always
graph:
name: node-combiner
type: COMBINER
children:
- name: node-one
type: MODEL
children: []
- name: node-two
type: MODEL
children: []
name: example
labels:
sidecar.istio.io/inject: "true"
replicas: 1
- Containers in separate pods
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: combiner-separate-pods
spec:
name: combiner-separate-pods
predictors:
- componentSpecs:
- spec:
containers:
- image: sdghafouri/templates:combiner
name: node-combiner
imagePullPolicy: Always
- spec:
containers:
- image: sdghafouri/templates:model1
name: node-one
imagePullPolicy: Always
- spec:
containers:
- image: sdghafouri/templates:model2
name: node-two
imagePullPolicy: Always
graph:
name: node-combiner
type: COMBINER
children:
- name: node-one
type: MODEL
children: []
- name: node-two
type: MODEL
children: []
name: example
labels:
sidecar.istio.io/inject: "true"
replicas: 1
the metadata endpoints of the combiner and former model nodes of the first case are accessible both at node and graph level as expected: for combiner:
curl http://localhost:32000/seldon/saeid/combiner-single-pod/api/v1.0/metadata/node-combiner | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 506 100 506 0 0 16322 0 --:--:-- --:--:-- --:--:-- 16866
{
"custom": {
"author": "seldon-dev"
},
"inputs": [
{
"messagetype": "tensor",
"schema": {
"names": [
...
for models:
curl http://localhost:32000/seldon/saeid/combiner-single-pod/api/v1.0/metadata/node-one | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 506 100 506 0 0 25300 0 --:--:-- --:--:-- --:--:-- 25300
{
"custom": {
"author": "seldon-dev"
},
"inputs": [
{
"messagetype": "tensor",
...
However, for the separate pod case only the the combiner node metadata is available:
curl http://localhost:32000/seldon/saeid/combiner-separate-pods/api/v1.0/metadata/node-combiner | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 506 100 506 0 0 24095 0 --:--:-- --:--:-- --:--:-- 25300
{
"custom": {
"author": "seldon-dev"
},
"inputs": [
{
"messagetype": "tensor",
...
But the model metadata endpoints are not accessible (same with node two):
curl http://localhost:32000/seldon/saeid/combiner-separate-pods/api/v1.0/metadata/node-one | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 91 100 91 0 0 3 0 0:00:30 0:00:30 --:--:-- 23
parse error: Invalid numeric literal at line 1, column 9
I also tried both cases with the seldon.io/engine-separate-pod: "true" for separate orchestrator too but the result were the same.
To reproduce
- images are public so deploying the above yaml files will reproduce the graphs.
Expected behaviour
The access to the endpoint should be the same in both separate and single pod deployments.
Environment
- Cloud Provider: Microk8s
- Kubernetes Cluster Version v1.22.9
- Deployed Seldon System Images:
value: docker.io/seldonio/seldon-core-executor:1.13.1
image: docker.io/seldonio/seldon-core-operator:1.13.1
Not sure if this is related to correctly expose SVCs as started in this https://github.com/SeldonIO/seldon-core/pull/4043
This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.
In v2 you can call metadata directly on each model.