bundle-kubeflow
bundle-kubeflow copied to clipboard
seldon-core charm stable backport is necessary to make an example run
With juju deploy kubeflow, it pulls the stable charm of seldon-core (revision 52). However, it fails to run a simple example of Seldon deployment.
By using the edge channel of the charm, it works. And after looking into the commits, there are some related ones so we need those to be available in the stable charm.
https://github.com/canonical/seldon-core-operator/pull/14 https://github.com/canonical/seldon-core-operator/issues/13
$ juju info seldon-core
name: seldon-core
charm-id: ZGHtHpN4TqAzrUlh9aG1SWxXenopHFRH
...
channels: |
latest/stable: 52 2022-01-25 (52) 1MB
latest/candidate: ↑
latest/beta: ↑
latest/edge: 58 2022-06-01 (58) 7MB
https://www.kubeflow.org/docs/external-add-ons/serving/seldon/#simple-example
$ cat <<EOF | kubectl create -n admin -f -
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
spec:
name: test-deployment
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
name: classifier
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
replicas: 1
EOF
Error from server (InternalError): error when creating "STDIN":
Internal error occurred: failed calling webhook
"v1.vseldondeployment.kb.io": Post
"https://seldon-webhook-service.kubeflow.svc:4443/validate-machinelearning-seldon-io-v1-seldondeployment?timeout=30s": dial tcp
10.152.183.249:4443: connect: connection refused
This was tricky since juju refresh seldon-controller-manager --channel edge didn't solve the problem. It required a fresh redeployment or manual edit to the service definition.
$ juju refresh seldon-controller-manager --channel edge
-> seldon-core revision 52 to 58
$ juju status seldon-controller-manager
Model Controller Cloud/Region Version SLA Timestamp
kubeflow microk8s-localhost microk8s/localhost 2.9.31 unsupported 13:01:41Z
App Version Status Scale Charm Channel Rev Address Exposed Message
seldon-controller-manager res:oci-image@047f2fc active 1 seldon-core edge 58 10.152.183.50 no
Unit Workload Agent Address Ports Message
seldon-controller-manager/1* active idle 10.1.60.76 8080/TCP,4443/TCP
$ microk8s kubectl -n kubeflow describe service/seldon-webhook-service
Name: seldon-webhook-service
Namespace: kubeflow
Labels: app=seldon
app.juju.is/created-by=seldon-controller-manager
app.kubernetes.io/instance=seldon-core
app.kubernetes.io/version=1.9.0
Annotations: <none>
Selector: app.kubernetes.io/name=seldon-controller-manager,control-plane=seldon-controller-manager
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.152.183.163
IPs: 10.152.183.163
Port: <unset> 4443/TCP
TargetPort: 4443/TCP
Endpoints: <none>
Session Affinity: None
Events: <none>
-> control-plane=seldon-controller-manager was still there after the refresh.
$ microk8s kubectl -n kubeflow edit service/seldon-webhook-service
$ microk8s kubectl -n kubeflow describe service/seldon-webhook-service
Name: seldon-webhook-service
Namespace: kubeflow
Labels: app=seldon
app.juju.is/created-by=seldon-controller-manager
app.kubernetes.io/instance=seldon-core
app.kubernetes.io/version=1.9.0
Annotations: <none>
Selector: app.kubernetes.io/name=seldon-controller-manager
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.152.183.163
IPs: 10.152.183.163
Port: <unset> 4443/TCP
TargetPort: 4443/TCP
Endpoints: 10.1.60.76:4443
Session Affinity: None
Events: <none>
-> After removing control-plane=seldon-controller-manager by hand, seldon job started working.
This should be resolved by the end of next week. We will be doing a patch release to CKF 1.4, and releasing it to the ch:kubeflow/1.4/* channels (as well as to ch:kubeflow/latest/stable, at least until it gets supplanted in a month or two)
This is resolved in the current release. If it reoccurs, please reopen