cloudflow icon indicating copy to clipboard operation
cloudflow copied to clipboard

Service account is missing when Spark Operator cleanup job is run

Open michaelpnash opened this issue 4 years ago • 0 comments

A user tried to re-install Cloudflow using version 1.3.2 and faced this issue:

“I just tried setting it up on GKE (v1.15.9-gke.22) with helm 2.14.3 and it gets consistently stuck during deployment of the Spark operator.Error creating: pods "cloudflow-sparkoperator-webhook-cleanup-" is forbidden: error looking up service account cloudflow/cloudflow-spark-operator: serviceaccount "cloudflow-spark-operator" not foundI tried un-installing it and going back to 1.3.1, but that installer gets stuck at the same spot.Sure enough, kubectl get serviceaccount -n cloudflow does not list cloudflow-spark-operator either...”

He is probably hitting this bug:https://github.com/helm/charts/pull/21679The problem is described as follows:

The init job creates a secret mounted into the spark-operator pod.

So the job must be created before the spark-operator deployment.

the init job requires permissions granted by the spark-operator service account.

So the job must be created after the spark-operator service account.

This clearly shows that the init cannot be a Helm hook, since Helm hooks are either pre-install or post-install.

This is fixed in version 0.6.11 of the helm charts while we currently 0.6.7 which still has the related hook annotations:https://github.com/helm/charts/blob/1488955c16e818240b7de96c644d52206f428951/incubator/sparkoperator/templates/webhook-cleanup-job.yamland thus it contains the bug. I recommend we upgrade helm charts to the latest.

michaelpnash avatar Apr 23 '20 18:04 michaelpnash