seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Openshift SeldonDeployment not managed by correct Operator pod when deploying multiple namespace scoped Operator Installs

Open strangiato opened this issue 3 years ago • 9 comments

Describe the bug

When installing Seldon as a namespaced operator in multiple namespaces the SeldonDeployment objects deployed in the second namespace will be managed and deployed by the operator pod running in the first namespace. If the first version of the operator is uninstalled, any SeldonDeployment objects created in the second namespace where the original operator is installed will fail with a webhook error pointing to the non-existent service in the original namespace.

To reproduce

  1. oc new-project seldon-test-1
  2. oc new-project seldon-test-2
  3. Install Seldon via the OperatorHub console using a namespace scoped install in the seldon-test-1 namespace:

image

or create the following yaml objects:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: seldon-operator-certified
  namespace: seldon-test-1
spec:
  channel: stable
  installPlanApproval: Automatic
  name: seldon-operator-certified
  source: certified-operators
  sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: seldon-test-1
  namespace: seldon-test-1
spec:
  targetNamespaces:
    - seldon-test-1
  1. Repeat step three for seldon-test-2 or create the following objects:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: seldon-operator-certified
  namespace: seldon-test-2
spec:
  channel: stable
  installPlanApproval: Automatic
  name: seldon-operator-certified
  source: certified-operators
  sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: seldon-test-2
  namespace: seldon-test-2
spec:
  targetNamespaces:
    - seldon-test-2
  1. Follow the logs for the operator deployed in seldon-test-1:

oc logs $(oc get pod -l control-plane=seldon-controller-manager -o name -n seldon-test-1) --follow -n seldon-test-1

  1. Create a SeldonDeployment in seldon-test-2:
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
    app.kubernetes.io/instance: seldon1
    app.kubernetes.io/name: seldon
    app.kubernetes.io/version: v0.5
  name: seldon-model
  namespace: seldon-test2
spec:
  name: test-deployment
  predictors:
    - componentSpecs:
        - spec:
            containers:
              - image: 'seldonio/mock_classifier:1.6.0'
                name: classifier
      graph:
        children: []
        name: classifier
        type: MODEL
      name: example
      replicas: 1
status: {}

Expected behaviour

The SeldonDeployment created in seldon-test-2 should be managed by the operator deployed in seldon-test-2 and should not be managed by the version of the operator deployed in seldon-test-1. The logs in the operator deployed in seldon-test-1 will show that it is deploying the new resource and the operator in seldon-test-2 does not indicate any activity.

Environment

  • Cloud Provider: [e.g. GKE, AWS, Bare Metal, Kind, Minikube] OpenShift 4.8

  • Kubernetes Cluster Version [Output of kubectl version]

oc version
Client Version: 4.7.0-202106032231.p0.git.e29b355-e29b355
Server Version: 4.8.31
Kubernetes Version: v1.21.6+b82a451
  • Deployed Seldon System Images: [Output of kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio]
oc get deployment seldon-controller-manager -o yaml | grep seldonio
                              "image": "seldonio/mock_classifier:1.6.0",
        containerImage: registry.connect.redhat.com/seldonio/seldon-core-operator@sha256:dbea873072acda45863dabc94555d39b9f48670ac04fec3e835c90531f1a2eda
          value: registry.connect.redhat.com/seldonio/seldon-core-executor@sha256:4aab0706cef5ae37e3d62ba3cc4f92bc5d0b0e18d0d953143d25d745696ecc54
          value: registry.connect.redhat.com/seldonio/seldon-engine@sha256:0abffc5f882b16a7ffa9ecfb7d1b362d8e8017b47e94b4b72b10dce74daeec65
          value: registry.connect.redhat.com/seldonio/storage-initializer@sha256:554547229653bf1ebedb88c8eec40e63c8282146e0a0ea14f2d47f000004439f
          value: registry.connect.redhat.com/seldonio/sklearnserver@sha256:0a68b243b28d2dc273a3278393e9f986b6ed5fc29aea7b9f3f343bf9efc5ac8e
          value: registry.connect.redhat.com/seldonio/xgboostserver@sha256:0ee0001730d21ca636417824655d74d423d4492d7fed28befc74c23caf0cc4c8
          value: registry.connect.redhat.com/seldonio/mlflowserver@sha256:3381cad85a16434f48cd65afbcfc65f2693bee46a149aa143a4088c9f9d899a2
          value: registry.connect.redhat.com/seldonio/tfproxy@sha256:407932f2d8e670bb4d3b9f6670a687039fae8d79312269d4da2677b73dc3e301
          value: registry.connect.redhat.com/seldonio/tensorflow-serving@sha256:04d1eee0208ca0e64ae277197cb1ddff4c0ee143a712a7f7a30faec397239dfc
          value: registry.connect.redhat.com/seldonio/alibiexplainer@sha256:6ec697ad5187639712701454198a5f6afa1f662d3c127641c06b4322f391d6dd
          value: registry.connect.redhat.com/seldonio/mock-classifier@sha256:ea78453871e656b71ec9ce4660623a58938d0492fb8b660cc36a1943c768ce4d
          value: docker.io/seldonio/engine:1.12.0
          value: seldonio/seldon-core-executor:1.12.0
        image: registry.connect.redhat.com/seldonio/seldon-core-operator@sha256:dbea873072acda45863dabc94555d39b9f48670ac04fec3e835c90531f1a2eda

Model Details

Using the default example model

strangiato avatar Feb 24 '22 22:02 strangiato

Hi @strangiato,

Thanks for bringing this to our attention. Is the namespace installation something you are actively looking to use?

RafalSkolasinski avatar Feb 25 '22 18:02 RafalSkolasinski

Hi Rafal, yes this is the default deployment strategy when Seldon is deployed from OpenDataHub on OpenShift.

strangiato avatar Feb 25 '22 18:02 strangiato

Interesting. Is it possible to deploy on OpenDataHub using All namespaces on the cluster option for the meantime?

RafalSkolasinski avatar Feb 25 '22 18:02 RafalSkolasinski

The ODH operator itself is generally deployed as a cluster scoped operator, but when a user chooses to deploy Seldon it would deploy it as a namespace scoped operator in that specific users namespace.

strangiato avatar Feb 25 '22 18:02 strangiato

For the sake of documentation I create a corresponding Issue for the ODH project here:

https://issues.redhat.com/browse/ODH-608

strangiato avatar Feb 25 '22 19:02 strangiato

So users of the ODH cannot install Seldon Operator cluster wide then. Could for the meantime be that administrators of the cluster could install both ODH + Seldon Operator (avail. in all namespace) and then users of ODH could just create SeldonDeployments?

RafalSkolasinski avatar Feb 25 '22 19:02 RafalSkolasinski

Yeah, that was the work around that I ended up implementing as an immediate resolution of the issue for my specific use case.

strangiato avatar Feb 25 '22 19:02 strangiato

@strangiato Is this still an issue for you or is workaround ok?

ukclivecox avatar Apr 11 '22 14:04 ukclivecox

The work around is fine for now but I would still consider this a bug and potential security vulnerability for anyone installing in a namespaced mode.

strangiato avatar Apr 11 '22 17:04 strangiato

We are seeing this on Seldon Core Operator 1.16.0 on GKE 1.24 and 1.25 in namespaced scope. Exactly as originally stated,

If the first version of the operator is uninstalled, any SeldonDeployment objects created in the second namespace where the original operator is installed will fail with a webhook error pointing to the non-existent service in the original namespace.

CloudMarc avatar May 03 '23 14:05 CloudMarc