cloudflow
cloudflow copied to clipboard
Service account is missing when Spark Operator cleanup job is run
A user tried to re-install Cloudflow using version 1.3.2 and faced this issue:
“I just tried setting it up on GKE (v1.15.9-gke.22) with helm 2.14.3 and it gets consistently stuck during deployment of the Spark operator.Error creating: pods "cloudflow-sparkoperator-webhook-cleanup-" is forbidden: error looking up service account cloudflow/cloudflow-spark-operator: serviceaccount "cloudflow-spark-operator" not foundI tried un-installing it and going back to 1.3.1, but that installer gets stuck at the same spot.Sure enough, kubectl get serviceaccount -n cloudflow does not list cloudflow-spark-operator either...”
He is probably hitting this bug:https://github.com/helm/charts/pull/21679The problem is described as follows:
The init job creates a secret mounted into the spark-operator pod.
So the job must be created before the spark-operator deployment.
the init job requires permissions granted by the spark-operator service account.
So the job must be created after the spark-operator service account.
This clearly shows that the init cannot be a Helm hook, since Helm hooks are either pre-install or post-install.
This is fixed in version 0.6.11 of the helm charts while we currently 0.6.7 which still has the related hook annotations:https://github.com/helm/charts/blob/1488955c16e818240b7de96c644d52206f428951/incubator/sparkoperator/templates/webhook-cleanup-job.yamland thus it contains the bug. I recommend we upgrade helm charts to the latest.