seldon-core
seldon-core copied to clipboard
Openshift SeldonDeployment not managed by correct Operator pod when deploying multiple namespace scoped Operator Installs
Describe the bug
When installing Seldon as a namespaced operator in multiple namespaces the SeldonDeployment objects deployed in the second namespace will be managed and deployed by the operator pod running in the first namespace. If the first version of the operator is uninstalled, any SeldonDeployment objects created in the second namespace where the original operator is installed will fail with a webhook error pointing to the non-existent service in the original namespace.
To reproduce
-
oc new-project seldon-test-1
-
oc new-project seldon-test-2
- Install Seldon via the OperatorHub console using a namespace scoped install in the seldon-test-1 namespace:
or create the following yaml objects:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: seldon-operator-certified
namespace: seldon-test-1
spec:
channel: stable
installPlanApproval: Automatic
name: seldon-operator-certified
source: certified-operators
sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: seldon-test-1
namespace: seldon-test-1
spec:
targetNamespaces:
- seldon-test-1
- Repeat step three for seldon-test-2 or create the following objects:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: seldon-operator-certified
namespace: seldon-test-2
spec:
channel: stable
installPlanApproval: Automatic
name: seldon-operator-certified
source: certified-operators
sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: seldon-test-2
namespace: seldon-test-2
spec:
targetNamespaces:
- seldon-test-2
- Follow the logs for the operator deployed in
seldon-test-1
:
oc logs $(oc get pod -l control-plane=seldon-controller-manager -o name -n seldon-test-1) --follow -n seldon-test-1
- Create a SeldonDeployment in seldon-test-2:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
labels:
app: seldon
app.kubernetes.io/instance: seldon1
app.kubernetes.io/name: seldon
app.kubernetes.io/version: v0.5
name: seldon-model
namespace: seldon-test2
spec:
name: test-deployment
predictors:
- componentSpecs:
- spec:
containers:
- image: 'seldonio/mock_classifier:1.6.0'
name: classifier
graph:
children: []
name: classifier
type: MODEL
name: example
replicas: 1
status: {}
Expected behaviour
The SeldonDeployment created in seldon-test-2
should be managed by the operator deployed in seldon-test-2
and should not be managed by the version of the operator deployed in seldon-test-1
. The logs in the operator deployed in seldon-test-1
will show that it is deploying the new resource and the operator in seldon-test-2
does not indicate any activity.
Environment
-
Cloud Provider: [e.g. GKE, AWS, Bare Metal, Kind, Minikube] OpenShift 4.8
-
Kubernetes Cluster Version [Output of
kubectl version
]
oc version
Client Version: 4.7.0-202106032231.p0.git.e29b355-e29b355
Server Version: 4.8.31
Kubernetes Version: v1.21.6+b82a451
- Deployed Seldon System Images: [Output of
kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio
]
oc get deployment seldon-controller-manager -o yaml | grep seldonio
"image": "seldonio/mock_classifier:1.6.0",
containerImage: registry.connect.redhat.com/seldonio/seldon-core-operator@sha256:dbea873072acda45863dabc94555d39b9f48670ac04fec3e835c90531f1a2eda
value: registry.connect.redhat.com/seldonio/seldon-core-executor@sha256:4aab0706cef5ae37e3d62ba3cc4f92bc5d0b0e18d0d953143d25d745696ecc54
value: registry.connect.redhat.com/seldonio/seldon-engine@sha256:0abffc5f882b16a7ffa9ecfb7d1b362d8e8017b47e94b4b72b10dce74daeec65
value: registry.connect.redhat.com/seldonio/storage-initializer@sha256:554547229653bf1ebedb88c8eec40e63c8282146e0a0ea14f2d47f000004439f
value: registry.connect.redhat.com/seldonio/sklearnserver@sha256:0a68b243b28d2dc273a3278393e9f986b6ed5fc29aea7b9f3f343bf9efc5ac8e
value: registry.connect.redhat.com/seldonio/xgboostserver@sha256:0ee0001730d21ca636417824655d74d423d4492d7fed28befc74c23caf0cc4c8
value: registry.connect.redhat.com/seldonio/mlflowserver@sha256:3381cad85a16434f48cd65afbcfc65f2693bee46a149aa143a4088c9f9d899a2
value: registry.connect.redhat.com/seldonio/tfproxy@sha256:407932f2d8e670bb4d3b9f6670a687039fae8d79312269d4da2677b73dc3e301
value: registry.connect.redhat.com/seldonio/tensorflow-serving@sha256:04d1eee0208ca0e64ae277197cb1ddff4c0ee143a712a7f7a30faec397239dfc
value: registry.connect.redhat.com/seldonio/alibiexplainer@sha256:6ec697ad5187639712701454198a5f6afa1f662d3c127641c06b4322f391d6dd
value: registry.connect.redhat.com/seldonio/mock-classifier@sha256:ea78453871e656b71ec9ce4660623a58938d0492fb8b660cc36a1943c768ce4d
value: docker.io/seldonio/engine:1.12.0
value: seldonio/seldon-core-executor:1.12.0
image: registry.connect.redhat.com/seldonio/seldon-core-operator@sha256:dbea873072acda45863dabc94555d39b9f48670ac04fec3e835c90531f1a2eda
Model Details
Using the default example model
Hi @strangiato,
Thanks for bringing this to our attention. Is the namespace installation something you are actively looking to use?
Hi Rafal, yes this is the default deployment strategy when Seldon is deployed from OpenDataHub on OpenShift.
Interesting. Is it possible to deploy on OpenDataHub using All namespaces on the cluster
option for the meantime?
The ODH operator itself is generally deployed as a cluster scoped operator, but when a user chooses to deploy Seldon it would deploy it as a namespace scoped operator in that specific users namespace.
For the sake of documentation I create a corresponding Issue for the ODH project here:
https://issues.redhat.com/browse/ODH-608
So users of the ODH cannot install Seldon Operator cluster wide then. Could for the meantime be that administrators of the cluster could install both ODH + Seldon Operator (avail. in all namespace) and then users of ODH could just create SeldonDeployments?
Yeah, that was the work around that I ended up implementing as an immediate resolution of the issue for my specific use case.
@strangiato Is this still an issue for you or is workaround ok?
The work around is fine for now but I would still consider this a bug and potential security vulnerability for anyone installing in a namespaced mode.
We are seeing this on Seldon Core Operator 1.16.0 on GKE 1.24 and 1.25 in namespaced scope. Exactly as originally stated,
If the first version of the operator is uninstalled, any SeldonDeployment objects created in the second namespace where the original operator is installed will fail with a webhook error pointing to the non-existent service in the original namespace.