bundle-kubeflow icon indicating copy to clipboard operation
bundle-kubeflow copied to clipboard

seldon-core charm stable backport is necessary to make an example run

Open nobuto-m opened this issue 3 years ago • 2 comments

With juju deploy kubeflow, it pulls the stable charm of seldon-core (revision 52). However, it fails to run a simple example of Seldon deployment.

By using the edge channel of the charm, it works. And after looking into the commits, there are some related ones so we need those to be available in the stable charm.

https://github.com/canonical/seldon-core-operator/pull/14 https://github.com/canonical/seldon-core-operator/issues/13

$ juju info seldon-core
name: seldon-core
charm-id: ZGHtHpN4TqAzrUlh9aG1SWxXenopHFRH
...
channels: |
  latest/stable:     52  2022-01-25  (52)  1MB
  latest/candidate:  ↑
  latest/beta:       ↑
  latest/edge:       58  2022-06-01  (58)  7MB

https://www.kubeflow.org/docs/external-add-ons/serving/seldon/#simple-example

$ cat <<EOF | kubectl create -n admin -f -
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-model
spec:
  name: test-deployment
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier_rest:1.3
          name: classifier
    graph:
      children: []
      endpoint:
        type: REST
      name: classifier
      type: MODEL
    name: example
    replicas: 1
EOF
Error from server (InternalError): error when creating "STDIN": 
Internal error occurred: failed calling webhook 
"v1.vseldondeployment.kb.io": Post 
"https://seldon-webhook-service.kubeflow.svc:4443/validate-machinelearning-seldon-io-v1-seldondeployment?timeout=30s": dial tcp 
10.152.183.249:4443: connect: connection refused

nobuto-m avatar Jun 11 '22 16:06 nobuto-m

This was tricky since juju refresh seldon-controller-manager --channel edge didn't solve the problem. It required a fresh redeployment or manual edit to the service definition.

$ juju refresh seldon-controller-manager --channel edge
-> seldon-core revision 52 to 58

$ juju status seldon-controller-manager
Model     Controller          Cloud/Region        Version  SLA          Timestamp
kubeflow  microk8s-localhost  microk8s/localhost  2.9.31   unsupported  13:01:41Z

App                        Version                Status  Scale  Charm        Channel  Rev  Address        Exposed  Message
seldon-controller-manager  res:oci-image@047f2fc  active      1  seldon-core  edge      58  10.152.183.50  no       

Unit                          Workload  Agent  Address     Ports              Message
seldon-controller-manager/1*  active    idle   10.1.60.76  8080/TCP,4443/TCP  

$ microk8s kubectl -n kubeflow describe service/seldon-webhook-service
Name:              seldon-webhook-service
Namespace:         kubeflow
Labels:            app=seldon
                   app.juju.is/created-by=seldon-controller-manager
                   app.kubernetes.io/instance=seldon-core
                   app.kubernetes.io/version=1.9.0
Annotations:       <none>
Selector:          app.kubernetes.io/name=seldon-controller-manager,control-plane=seldon-controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.152.183.163
IPs:               10.152.183.163
Port:              <unset>  4443/TCP
TargetPort:        4443/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

-> control-plane=seldon-controller-manager was still there after the refresh.

$ microk8s kubectl -n kubeflow edit service/seldon-webhook-service

$ microk8s kubectl -n kubeflow describe service/seldon-webhook-service
Name:              seldon-webhook-service
Namespace:         kubeflow
Labels:            app=seldon
                   app.juju.is/created-by=seldon-controller-manager
                   app.kubernetes.io/instance=seldon-core
                   app.kubernetes.io/version=1.9.0
Annotations:       <none>
Selector:          app.kubernetes.io/name=seldon-controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.152.183.163
IPs:               10.152.183.163
Port:              <unset>  4443/TCP
TargetPort:        4443/TCP
Endpoints:         10.1.60.76:4443
Session Affinity:  None
Events:            <none>

-> After removing control-plane=seldon-controller-manager by hand, seldon job started working.

nobuto-m avatar Jun 12 '22 13:06 nobuto-m

This should be resolved by the end of next week. We will be doing a patch release to CKF 1.4, and releasing it to the ch:kubeflow/1.4/* channels (as well as to ch:kubeflow/latest/stable, at least until it gets supplanted in a month or two)

ca-scribner avatar Jun 23 '22 14:06 ca-scribner

This is resolved in the current release. If it reoccurs, please reopen

ca-scribner avatar Oct 13 '22 13:10 ca-scribner