Following the docs for kind+Eirini result in a non working cluster
Describe the bug Follow the docs there: https://kubecf.io/docs/tutorials/deploy-kind/
(take this PR into account: https://github.com/cloudfoundry-incubator/kubecf-docs/pull/37)
When all pods are up and running, try to push an app (e.g. https://github.com/scf-samples/dizzylizard)
After staging is done, the app pods don't start. They fail with Error: ErrImagePull
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned eirini/dizzy-default-0b5c24131a-3 to kubecf-control-plane
Normal Pulling 23m (x4 over 24m) kubelet, kubecf-control-plane Pulling image "127.0.0.1:31666/cloudfoundry/1056bd51-07bd-4e84-9a2c-f907e68d73b4:362379573001a18c2b4671ae9fbeb9ba17be290e"
Warning Failed 23m (x4 over 24m) kubelet, kubecf-control-plane Failed to pull image "127.0.0.1:31666/cloudfoundry/1056bd51-07bd-4e84-9a2c-f907e68d73b4:362379573001a18c2b4671ae9fbeb9ba17be290e": rpc error: code = Unknown desc = failed to pull and unpack image "127.0.0.1:31666/cloudfoundry/1056bd51-07bd-4e84-9a2c-f907e68d73b4:362379573001a18c2b4671ae9fbeb9ba17be290e": failed to resolve reference "127.0.0.1:31666/cloudfoundry/1056bd51-07bd-4e84-9a2c-f907e68d73b4:362379573001a18c2b4671ae9fbeb9ba17be290e": unexpected status code [manifests 362379573001a18c2b4671ae9fbeb9ba17be290e]: 400 Bad Request
Warning Failed 23m (x4 over 24m) kubelet, kubecf-control-plane Error: ErrImagePull
Warning Failed 9m58s (x64 over 24m) kubelet, kubecf-control-plane Error: ImagePullBackOff
Normal BackOff 4m57s (x86 over 24m) kubelet, kubecf-control-plane Back-off pulling image "127.0.0.1:31666/cloudfoundry/1056bd51-07bd-4e84-9a2c-f907e68d73b4:362379573001a18c2b4671ae9fbeb9ba17be290e"
To Reproduce See above
Expected behavior The application pod should be up and running
Environment kubecf v2.6.1 on kind v0.9.0 go1.15.2 linux/amd64
Additional context Add any other context about the problem here.
I hit this same issue, kubecf v2.6.1 on kind v0.9.0 go1.13 linux/amd64. Was wondering if it had something do to with the cert instructions. The command does succeed but differs from that at the bottom of https://kubecf.io/docs/tutorials/deploy-k3s (cert location is from a bits-service-ssl secret rather than the node's system). Could be a red herring, kind vs k3s, but thought it worth mentioning.
I replaced the opi image in the eirini pod with jimmykarily/opi which is an image I built using the same code but disabling the code that deletes the staging job after it's done (more here: https://github.com/cloudfoundry-incubator/kubecf/issues/1323#issuecomment-692530753). It seems that the uploader init container never succeeds:
┌────────────────────────────────────────────────────────────────────── Containers(eirini/dizzylizard-default-6cmpb)[3] ──────────────────────────────────────────────────────────────────────┐
│ NAME↑ PF IMAGE READY STATE INIT RESTARTS PROBES(L:R) PORTS AGE │
│ opi-task-downloader ● registry.suse.com/cap-staging/recipe-downloader:1.8.0-24.56 true Completed true 0 off:off 5m27s │
│ opi-task-executor ● registry.suse.com/cap-staging/recipe-executor:1.8.0-24.56 true Completed true 0 off:off 5m27s │
│ opi-task-uploader ● registry.suse.com/cap-staging/recipe-uploader:1.8.0-24.56 false Completed false 0 off:off 5m27s │
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ │
but it doesn't print an error either.
Can't tell for sure if the staging container worked or not. What I do see though is an error in the singleton blobstore pod:
~/dizzylizard (master)*$ kubectl logs -n kubecf singleton-blobstore-0 -c blobstore-nginx
nginx: [alert] could not open error log file: open() "/var/vcap/packages/nginx_webdav/logs/error.log" failed (13: Permission denied)
execing in the pod shows that the nginx process is started by user vcap but that dir is owned by root. Not sure if it's relevant to the failed staging but something to look at for sure (may have to do with changes in the stemcell).
Also this may be relevant:
https://github.com/cloudfoundry-incubator/kubecf/commit/31dc889eaf7b71b02e396d59f959275f9927756b#diff-6cef92220d2f63c1a73bbbeca21e2d1a7c08d210fbf109a08ee08bae91e723a1
I tried deploying v2.6.1 using the make targets in kubecf:
$ git checkout v2.6.1
$ make kind-start
$ make all
and after all pods are up and running, pushing the example app (dizzylizard) works. So, I realized it must have something to do with how kind is setup, because make kind-start is doing some preparation to the cluster, other than simply calling kind create cluster: https://github.com/cloudfoundry-incubator/kubecf/blob/master/scripts/kind-start.sh
To verify, I created a fresh cluster with make kind-start and then I followed the docs to deploy kubecf like in the description (thus the only difference to the reproduction steps was the way I created the kind cluster). Pushing the app works in this case.
So it seems that something in what make kind-start is doing, is necessary to make things work. We need to find out what that is and document that in the documentation page.
ok new data: Simply using kind create cluster --image "kindest/node:v1.17.5" and then following the docs makes it work. The problem is with the k8s version. The make target is pulling 1.17.5 while the command from the docs simply pulls latest (which doesn't work). Is 1.19 already supported by kubecf @viovanov ? If not then I will simply update the docs to use a supported version (preferably 1.17.5 which is known to work).