spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Certifictes are generated by operator rather than gencerts.sh

Open ChenYi015 opened this issue 9 months ago • 1 comments

Purpose of this PR

Close #1959

Proposed changes:

  • hack/gencerts.sh will not be used to generate certificates any more, operator is responsible for generating CA certificate and server certificate
  • delete webhook-init-job.yaml since webhook secret will be created by helm and updated by spark operator
  • delete webhook-cleanup-job.yaml since webhook secret will be deleted by helm
  • spark operator rbac resources are managed by helm rather than helm hooks since there is no webhook init job anymore
  • update Dockerfile

Change Category

Indicate the type of change by marking the applicable boxes:

  • [ ] Bugfix (non-breaking change which fixes an issue)
  • [ ] Feature (non-breaking change which adds functionality)
  • [x] Breaking change (fix or feature that could affect existing functionality)
  • [x] Documentation update

Rationale

It would be better that the spark operator RBAC resources and webhook secrets are manged by helm rather than helm hooks.

Checklist

Before submitting your PR, please review the following:

  • [x] I have conducted a self-review of my own code.
  • [x] I have updated documentation accordingly.
  • [x] I have added tests that prove my changes are effective or that my feature works.
  • [x] Existing unit tests pass locally with my changes.

Additional Notes

ChenYi015 avatar May 08 '24 11:05 ChenYi015

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar May 08 '24 11:05 google-oss-prow[bot]

/assign @vara-bonthu

ChenYi015 avatar May 10 '24 07:05 ChenYi015

@vara-bonthu Could you review this PR, thanks!

ChenYi015 avatar May 13 '24 06:05 ChenYi015

@yuchaoran2011 Could you review this PR, thanks!

ChenYi015 avatar May 29 '24 06:05 ChenYi015

@ChenYi015 Could you resolve the merge conflicts?

yuchaoran2011 avatar Jun 04 '24 05:06 yuchaoran2011

@yuchaoran2011 Rebase and force-pushed.

ChenYi015 avatar Jun 04 '24 06:06 ChenYi015

@vara-bonthu I had updated related docs and did e2e tests as following:

  1. Create a kind config kind-config.yaml:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
  1. Create a kind cluster:
kind create cluster --config kind-config.yaml
  1. Build docker image and load into kind cluster:
docker build -t docker.io/kubeflow/spark-operator:local .
kind load docker-image docker.io/kubeflow/spark-operator:local
  1. Install the helm chart with webhook enabled:
helm install spark-operator charts/spark-operator-chart \
    --namespace spark-operator \
    --create-namespace \
    --set image.tag=local \
    --set webhook.enable=true \
    --set enforceQuotaEnforcement=true \
    --set 'sparkJobNamespaces[0]=default'
  1. Inspect the webhook secret to verity the private keys and certificates are populated correctly:
$ kubectl get secret -n spark-operator -o yaml spark-operator-webhook-certs 
apiVersion: v1
data:
  ca-cert.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURxakNDQXBLZ0F3SUJBZ0lJUU9zWWwzdDhEaVl3RFFZSktvWklodmNOQVFFTEJRQXdVVEVYTUJVR0ExVUUKQ2hNT2MzQmhjbXN0YjNCbGNtRjBiM0l4TmpBMEJnTlZCQU1UTFhOd1lYSnJMVzl3WlhKaGRHOXlMWGRsWW1odgpiMnN0YzNaakxuTndZWEpyTFc5d1pYSmhkRzl5TG5OMll6QWVGdzB5TkRBMk1EVXdNekk0TVRkYUZ3MHpOREEyCk1EVXdNekk0TVRkYU1GRXhGekFWQmdOVkJBb1REbk53WVhKckxXOXdaWEpoZEc5eU1UWXdOQVlEVlFRREV5MXoKY0dGeWF5MXZjR1Z5WVhSdmNpMTNaV0pvYjI5ckxYTjJZeTV6Y0dGeWF5MXZjR1Z5WVhSdmNpNXpkbU13Z2dFaQpNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUM4cit5aUZ5eENFOEx4a0lCaitmL3lUcjBVClVtSmZKU2JINHI3L2V4NU16VFRFVzFVYS84MW42VnhrNTRpZlh1YWxMMDUwb1cyMlFLWndGMnJrWGRNVmlwUTgKRlY4cUlWb214M045MWNUajUvUnlEcmdPTUhhZVVJK3ltT0xteWZxUklSQVFXdjluaWxwUGdCOTIybVZPaE5CcAptQ3UxK1dGTVJReHhtZkw1TUkwcFVJaEROSkdCdHl3SUtUbWREQSs3NkRORS9pMzkrSWNoVHJaWTJkZG1WSnEwCjNLVTIxdS93TXVzYWo4S05oVlZpUlRZbTYxVk5rR0t4YlNIdFprZFlLMlJLbmcxR08wMGNaTktYTDM3SjVjek8KcGRhWjFEekliNElCUDJCdE1Nb0s3WW5pREtxRnhaTVZ6VHJoL2J5aVByNHJyYko4em1Eamd2dytrdXh0QWdNQgpBQUdqZ1lVd2dZSXdEZ1lEVlIwUEFRSC9CQVFEQWdYZ01CMEdBMVVkSlFRV01CUUdDQ3NHQVFVRkJ3TUJCZ2dyCkJnRUZCUWNEQWpBTUJnTlZIUk1CQWY4RUFqQUFNRU1HQTFVZEVRUThNRHFDQ1d4dlkyRnNhRzl6ZElJdGMzQmgKY21zdGIzQmxjbUYwYjNJdGQyVmlhRzl2YXkxemRtTXVjM0JoY21zdGIzQmxjbUYwYjNJdWMzWmpNQTBHQ1NxRwpTSWIzRFFFQkN3VUFBNElCQVFBYThIdWxYSkt2RlBCRFVHeTNpMGtqcklmcGIxdG9sa1FReU16YUJ5MFVWTE92CjVOWkpOcUZOYkRxWFpGV1VZYnorY1FDWUJpYmJiWW9mZTg3Z0Q3Vi9MeUJ3WGxvbXRGQmg4Njl3Yk5SUExEb2gKenJBREFtdkZQSWRtNURHZkRlT1lxdldObEU2S2Z6NUh1bldkNkNKdlovRDRKbG5xeWVKTGJNamg0dVZuWHA2NQprNkJLUGtYSC8zK3pFTEEzNnFLcWFwa1FXb3J5dExnWUNxdGdhRmp4OWFndjlyUnI1ZFQrVmRGYk0ySTdwU1d5CnZCK0Jpcmt3T0xVUk55MWhCQkpHdjM1UFVucVVIN2FqWHF1YytqL1N2NHo5OEp5WmIrb1lqL252dFZyc2Npd2wKYUNvZUV1ZTE4VGxYTklKTEJ2T0pTajEzVFNKSVNvR3ZwZnFEcHdrOQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  ca-key.pem: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBNDFBY2l2SGNLQzk5RkJCZDBqdHRTQjdSb3FhVmY1cHM0ZjI2WDhvclQ4Y2dscCtuCjdwckFaeXc1YVFWSXhmSFphNTdlUmMxUVFtVGVTaUtNTE5Nc1h2VWxCWmMyb3Y1M3U3RjN6MGN4TVhsYU5xUEsKSkZZanJpd2lhYlBZdGR3aHA5R3F5RUpPdmdOempzVldBUElBWGkwUW13dnVKeUJOZTMrVFd1djJ0MnhSVmNyNApyYjFtOVFrNHhsL0pIMTI1OTJoQ2hyQTNFNG83em9Zd0cxUWF4RDFDdXNWWG1FcjhoQmowdkxqTDFhY0VOS2poCk9yVE9HQzdUWStrTFlYUU9LdmN4aCtOYmE3bGJGbWN6dHErSkpHU0FyQVpYWGhIUTNkSFNEM09qa3o5V00vRVkKaXhxRTRmRmd0TTh1QWlZTjdtaHF0MmZCTFA4SmpWK1BURWdhRHdJREFRQUJBb0lCQUNFcVhSL0FyaGlHNVQ3NgpMRll5S1gydVVYUGp6a2d4NWRVTFNoZ1R6VUgwa2NLb1JMNUJnZlVMdE15bjRyaE8weVFxcDgrVFp6Um90eTRsCjRFSGlCY1ZOQ3p2SGxrY3R6WlpyREVvSDN4dVMweURKd1FLUU51Q0F1L3lrS3VoTjEvTStXaWFoMWc5UFBac0YKRzhsRGhkNDN3UVorTlI4c1RXSEplVng0dFNTSnVPUTZ6WjRCQnJoMGR5V1hSQVcxZFYxVWV4dzh5RnoxTVFDagpZNGJKY0NKd21sRFFwZTM1NmREZWI5WlBSaVJUQjhadXFNS2FlakVkc1dZRWJhdVMvanhBcThNc0NkWkszMFZmCkdiSlcyT1dzYTJKZzEycnhmL1RsenBPd3JGS2RLZmljWGRwcGVxWlMzbDAwNmcrM3J0Q3VGK2s1aWpDR21nYTIKUHVNaXZNRUNnWUVBN0pzRndYZ0Z6UHhQNUQrazdLR05SWThnaVVZZVRKd0dZb0tyMlZyTEFNWU5DY2gzNzc5bgpwRlN1MUxCY1VHN0xlUDRxWGxhSWV1MVBYN2h4bi91anFYUW40OUc5NmN5TkNCbjBWRUtLS3hhREpoWVlxYitDCnBJeFRWa1JzYWR4L0xMWTRPSXUxQnpiYkJuV3BLV1FLeHh3VFNheGJSbWgwQSttamhuZXVweEVDZ1lFQTlmSVgKUUVwNXZMeUIzakNRaEcvYVZQU0U5N2hvUGRnRCs5WnE2Qjl5RFdtOXVPOXdUQmdUWVZJTk5JbEFyaEQrUFEwWAprTG5KSTBGZGJodmMxN2hCaWltV0s1dDhiWXNnMHRpQ2lIVW1GWHlGN0htbWhhQjJXNGpkd1RPRDRPYTlydTZXCktrd3l3VTBnMURpSE9Pb1lHbXhBT3dmbmZ0WGUyL0k1Z0hlWjd4OENnWUVBMWQ4clRMNlpQN215M2JkSjlUdnkKM3pXSlM0eStSckdpYzlsNlRYYnNtVDVzK3JMaTl5d2xHejRRNnVDZ0VYU1ZLRUZYT3Y4dFR6REQxdHA2bXdwegozZkRKUGYyUmxZejR6cUhuWVdMa1VoNS9YaVlMRlNXdmlkM3VWc1J5Mng0ZE51VmYzSDBzbmVEUUN2N0FjbEdrCkRHY3NhQ1FNUFpDZGpndmJiT2t5VG9FQ2dZRUFxZFAvWmkrSEhHSjJzcnlLTGtrbVZCOThhYW4yb1MyMm9vR08KMUxaU0JSME5HdFNMa0ovWFVnNWNlL2lDcHkrb3Z2TjVZRUJKdVlSN1JYc0w1aEdmZ0EzeldpMUZvRWEvNVpnSAptcjU2QzhBdW9mbm1tTU1TdDJZczZpbnVXTEE4THIwbENCUVJ3QlRJSklMY0xOckl4Z1lWM0MwN0Z3UUxuWWtIClY4UStrVFVDZ1lCWnBvYlBFMGpaY0UvZXVlQzNmcUlTem90NmRGejNpQmNuMjFqZGxBQWNrNUZGWStrNVViaGcKSHcvNFBTRkMza1ZZTDFCemZoS3huMDMrN2ozazhRdE9MRGRLSy9xMm44WUpnUjhoSzQ2eVljODJoWkM0TWk1QQpCM3RkRFZLRHVoNldtUUg4cCtXRkZiZng5RnQ5SWR3TkJndjE0a3pPd2RnS0d6ZVZHdVgrd0E9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
  server-cert.pem: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURxakNDQXBLZ0F3SUJBZ0lJUU9zWWwzdDhEaVl3RFFZSktvWklodmNOQVFFTEJRQXdVVEVYTUJVR0ExVUUKQ2hNT2MzQmhjbXN0YjNCbGNtRjBiM0l4TmpBMEJnTlZCQU1UTFhOd1lYSnJMVzl3WlhKaGRHOXlMWGRsWW1odgpiMnN0YzNaakxuTndZWEpyTFc5d1pYSmhkRzl5TG5OMll6QWVGdzB5TkRBMk1EVXdNekk0TVRkYUZ3MHpOREEyCk1EVXdNekk0TVRkYU1GRXhGekFWQmdOVkJBb1REbk53WVhKckxXOXdaWEpoZEc5eU1UWXdOQVlEVlFRREV5MXoKY0dGeWF5MXZjR1Z5WVhSdmNpMTNaV0pvYjI5ckxYTjJZeTV6Y0dGeWF5MXZjR1Z5WVhSdmNpNXpkbU13Z2dFaQpNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUM4cit5aUZ5eENFOEx4a0lCaitmL3lUcjBVClVtSmZKU2JINHI3L2V4NU16VFRFVzFVYS84MW42VnhrNTRpZlh1YWxMMDUwb1cyMlFLWndGMnJrWGRNVmlwUTgKRlY4cUlWb214M045MWNUajUvUnlEcmdPTUhhZVVJK3ltT0xteWZxUklSQVFXdjluaWxwUGdCOTIybVZPaE5CcAptQ3UxK1dGTVJReHhtZkw1TUkwcFVJaEROSkdCdHl3SUtUbWREQSs3NkRORS9pMzkrSWNoVHJaWTJkZG1WSnEwCjNLVTIxdS93TXVzYWo4S05oVlZpUlRZbTYxVk5rR0t4YlNIdFprZFlLMlJLbmcxR08wMGNaTktYTDM3SjVjek8KcGRhWjFEekliNElCUDJCdE1Nb0s3WW5pREtxRnhaTVZ6VHJoL2J5aVByNHJyYko4em1Eamd2dytrdXh0QWdNQgpBQUdqZ1lVd2dZSXdEZ1lEVlIwUEFRSC9CQVFEQWdYZ01CMEdBMVVkSlFRV01CUUdDQ3NHQVFVRkJ3TUJCZ2dyCkJnRUZCUWNEQWpBTUJnTlZIUk1CQWY4RUFqQUFNRU1HQTFVZEVRUThNRHFDQ1d4dlkyRnNhRzl6ZElJdGMzQmgKY21zdGIzQmxjbUYwYjNJdGQyVmlhRzl2YXkxemRtTXVjM0JoY21zdGIzQmxjbUYwYjNJdWMzWmpNQTBHQ1NxRwpTSWIzRFFFQkN3VUFBNElCQVFBYThIdWxYSkt2RlBCRFVHeTNpMGtqcklmcGIxdG9sa1FReU16YUJ5MFVWTE92CjVOWkpOcUZOYkRxWFpGV1VZYnorY1FDWUJpYmJiWW9mZTg3Z0Q3Vi9MeUJ3WGxvbXRGQmg4Njl3Yk5SUExEb2gKenJBREFtdkZQSWRtNURHZkRlT1lxdldObEU2S2Z6NUh1bldkNkNKdlovRDRKbG5xeWVKTGJNamg0dVZuWHA2NQprNkJLUGtYSC8zK3pFTEEzNnFLcWFwa1FXb3J5dExnWUNxdGdhRmp4OWFndjlyUnI1ZFQrVmRGYk0ySTdwU1d5CnZCK0Jpcmt3T0xVUk55MWhCQkpHdjM1UFVucVVIN2FqWHF1YytqL1N2NHo5OEp5WmIrb1lqL252dFZyc2Npd2wKYUNvZUV1ZTE4VGxYTklKTEJ2T0pTajEzVFNKSVNvR3ZwZnFEcHdrOQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  server-key.pem: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcFFJQkFBS0NBUUVBdksvc29oY3NRaFBDOFpDQVkvbi84azY5RkZKaVh5VW14K0srLzNzZVRNMDB4RnRWCkd2L05aK2xjWk9lSW4xN21wUzlPZEtGdHRrQ21jQmRxNUYzVEZZcVVQQlZmS2lGYUpzZHpmZFhFNCtmMGNnNjQKRGpCMm5sQ1BzcGppNXNuNmtTRVFFRnIvWjRwYVQ0QWZkdHBsVG9UUWFaZ3J0ZmxoVEVVTWNabnkrVENOS1ZDSQpRelNSZ2Jjc0NDazVuUXdQdStnelJQNHQvZmlISVU2MldOblhabFNhdE55bE50YnY4RExyR28vQ2pZVlZZa1UyCkp1dFZUWkJpc1cwaDdXWkhXQ3RrU3A0TlJqdE5IR1RTbHk5K3llWE16cVhXbWRROHlHK0NBVDlnYlRES0N1MkoKNGd5cWhjV1RGYzA2NGYyOG9qNitLNjJ5Zk01ZzQ0TDhQcExzYlFJREFRQUJBb0lCQVFDNEFqaWF1azZIQWc2UwoxWURmL3VZRHY1WFZRNko3ZHhlaXh4WE13SnlEK1hzRUlxMlVidkk1Ni9JVzFWVC9WdVZISWlNNHlsVGI3NkJnCm4vVzJUMm1URUZvUFhpZzRSZDVOQXlVMkNrckFsMnhqN3NhL3o3TmVJT0tDSVdibCt3TklsUjI5VllETjBMYlIKNFBqT1I1MlVQU0dpV0t3SUF2TklGZTVVdXZXZzNIQ0xDTmlRTEMzM0VBOVo4MmNwSVoxdkUrOG1rbGtteXdOcwpQczJvUGQ5eVNTZUpUalU4clM2MDBxVjRCTTdPcTcwYjBMKzdyRjQ4OTMwaTFaMis4MjcxanFMK2FUWGI2elR1CjRuQ0pIczNyRUpsZ0tqQUl5UDlaMVF3MmphcEsxUlFvc3V6V2I0aUcwcjFDOWwwSzJBaW1PdDhYTkMvcUtSdk0KQm85Z0VRSmhBb0dCQU1MdlFIdk03MmxITDVtTWp2eE9FOFJoVk9maHJUQWNtcmMvRDIxU1JTZ3VhNFA1K1NjTwo1aFJBT0NWWUpleWo0aUhSVTI1OU00aUV1aGowZXdhZ3U2a29xN3g3QXZ3TjNxdEZIVHUvWnJsVFhpdE9WaEt6Clo3OVZjNXZHMm80SjUvRUppNm5VamRQeXZ5Mm1xWHp2aE92MmJJeVJQcTBUbnZFWWYwM21JUXZEQW9HQkFQZkwKcWxPUVRZOWx3OUpHUlNoMk1GQW9XMXBIUlpYTDhjQW5NblhEVnBXVkg5QVFJb2VBVnpteVdHMllkV2lBRXZ6dQp6MkhueFZqUHpMc0JLQkVHN3hLSWx6emNGZ2tIUkIxMVl6YXFGNUpRTGROYjFNZUNXSmNTdEx5L3FGWFVvSDRDCldPbXBGWHRCWFJkUjVxSkp6Yi83c1Z2OVBwTDB3OTk2WWRBWDZCUVBBb0dCQUpWdWtORVdqYVQzdy82Q2FJM3oKVUdYbmN3MzZ5eWVwbGRUSmk0cnpXVDV2TDA1Ump2U3BFQ2tQL2JwcTgwK1BaZWNrcno5d3pOTm5ZNzJEbE5mRQoyWGJZVGFaRDZrck1XeGlSOTlINGJNZStwOTZzdzRDOGROaVFxZm9ObXpidFV4ZE1pUHJjalFpZitud0ZXY0lEClhyTUFDY0JNQzI3a0xxQ0ZkZm1DWTJ5L0FvR0JBT3ZUTFllWHB1alkvZE5Kd3htdDJXNy82V2p5dVh2RmU0N1cKL3dQcVlxVzdKV3FiWUhFNnFFaWx2ZGlYcHUxTUxrWC9kT2lGYm1DR2F4NXlERktnR2JpMnU5QlUyTGZBN1lkbgpwNE5udjBVay8yZk9WcU9GSHBDd1ljZmNVdlZVaFdWSEVKMVhxTFVEMFBlWG4zcEY2UVZVSVVnZHJJYXBZUngzCldVMTA0dzdyQW9HQVhCT21BaU9HRStTaHo0SG5aZmo4ZEVKdnNXcjN2OTh2OFpaK3VyYmQyRUFVNUwyeENHdGUKYzlCM1BLZXFOTFhzWkZDaER3aHg4WnE0eml1UGE1cjVTUzI0aFpNNEFjWGlpVkJVY1Y4YWhqNlN6TnRxUTlKUAp2QW1BdHNDM25ZYVhPOFIyS1FWcjBhT3JJemFGbE9qZW8xYTRtVjFwMVBtemY1Szc1M0lVc2NVPQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
kind: Secret
metadata:
  annotations:
    meta.helm.sh/release-name: spark-operator
    meta.helm.sh/release-namespace: spark-operator
  creationTimestamp: "2024-06-05T03:28:16Z"
  labels:
    app.kubernetes.io/instance: spark-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: spark-operator
    app.kubernetes.io/version: v1beta2-1.6.0-3.5.0
    helm.sh/chart: spark-operator-1.4.0
  name: spark-operator-webhook-certs
  namespace: spark-operator
  resourceVersion: "1389"
  uid: b71143f7-d117-4080-bfb5-018ce50ca766
type: Opaque
  1. Submit an SparkApplication:
# spark-pi.yaml
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: spark:3.5.1
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.1.jar
  sparkVersion: "3.5.1"
  restartPolicy:
    type: Never
  volumes:
  - name: test-volume
    hostPath:
      path: /tmp
      type: Directory
  driver:
    volumeMounts:
    - name: test-volume
      mountPath: /tmp
    serviceAccount: spark-operator-spark
    labels:
      version: "3.5.1"
  executor:
    instances: 1
    volumeMounts:
    - name: test-volume
      mountPath: /tmp
    labels:
      version: "3.5.1"
kubectl apply -f spark-pi.yaml
  1. Inspect whether the volume is mounted successfully by webhook:
$ kubectl get pod spark-pi-driver -o json | jq '.spec.containers[0].volumeMounts' 
[
  {
    "mountPath": "/var/data/spark-9e473bfd-e7ef-47ca-ba5f-b9840591a8fb",
    "name": "spark-local-dir-1"
  },
  {
    "mountPath": "/opt/spark/conf",
    "name": "spark-conf-volume-driver"
  },
  {
    "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
    "name": "kube-api-access-ck8lk",
    "readOnly": true
  },
  {
    "mountPath": "/tmp",
    "name": "test-volume"
  }
]
  1. Inspect the spark operator logs to verify webhook server works:
$ kubectl logs -n spark-operator spark-operator-79bb9ffdd7-pkwt5 | grep webhook.go
I0605 03:37:57.589167      13 webhook.go:366] Updated webhook secret spark-operator/spark-operator-webhook-certs
I0605 03:37:57.589416      13 webhook.go:218] Starting the Spark admission webhook server
I0605 03:37:57.590365      13 webhook.go:484] Updating existing MutatingWebhookConfiguration for the Spark pod admission webhook
I0605 03:38:07.577914      13 webhook.go:244] Serving admission request
I0605 03:38:07.580560      13 webhook.go:616] Pod spark-pi-driver in namespace default is subject to mutation
I0605 03:38:10.184861      13 webhook.go:244] Serving admission request
I0605 03:38:10.185471      13 webhook.go:616] Pod spark-pi-7ae9138fe679bede-exec-1 in namespace default is subject to mutation

ChenYi015 avatar Jun 05 '24 03:06 ChenYi015

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vara-bonthu, yuchaoran2011

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [vara-bonthu,yuchaoran2011]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Jun 05 '24 14:06 google-oss-prow[bot]

Hey @ChenYi015 , can you clarify why this is a breaking change? I'm working on upgrading my team's spark-operator repo and while I can go forwards to the new helm chart that includes this change, I can't revert back without getting an error about service accounts not existing (even though we configured them). I'm wondering if its due to the order that rbac resources get created. Before they were created with helm hooks and now they're created by the chart installation process. What do you think?

colinsteidtmann avatar Jul 03 '24 21:07 colinsteidtmann

Hey @ChenYi015 , can you clarify why this is a breaking change? I'm working on upgrading my team's spark-operator repo and while I can go forwards to the new helm chart that includes this change, I can't revert back without getting an error about service accounts not existing (even though we configured them). I'm wondering if its due to the order that rbac resources get created. Before they were created with helm hooks and now they're created by the chart installation process. What do you think?

@colinsteidtmann Before, the rbac resourecs for operator were created by helm pre-install and pre-upgrade hooks, but not pre-rollback hook. Thus, when you try to rollback the chart, rbac resources will not be created.

ChenYi015 avatar Jul 04 '24 01:07 ChenYi015

Hey @ChenYi015 , can you clarify why this is a breaking change? I'm working on upgrading my team's spark-operator repo and while I can go forwards to the new helm chart that includes this change, I can't revert back without getting an error about service accounts not existing (even though we configured them). I'm wondering if its due to the order that rbac resources get created. Before they were created with helm hooks and now they're created by the chart installation process. What do you think?

@colinsteidtmann Before, the rbac resourecs for operator were created by helm pre-install and pre-upgrade hooks, but not pre-rollback hook. Thus, when you try to rollback the chart, rbac resources will not be created.

Thanks, we're actually using Terraform's helm provider to manage our helm releases, so our "rollback" is effectively changing helm chart versions and running terraform apply. I'm having trouble figuring out which hooks get triggered and when, I thought terraform apply would always trigger either the upgrade or install hook, but maybe not. Do you have any ideas on how we can rollback spark operator smoothly? Is it possible to create the rbac resources manually?

colinsteidtmann avatar Jul 04 '24 02:07 colinsteidtmann

Hey @ChenYi015 , can you clarify why this is a breaking change? I'm working on upgrading my team's spark-operator repo and while I can go forwards to the new helm chart that includes this change, I can't revert back without getting an error about service accounts not existing (even though we configured them). I'm wondering if its due to the order that rbac resources get created. Before they were created with helm hooks and now they're created by the chart installation process. What do you think?

@colinsteidtmann Before, the rbac resourecs for operator were created by helm pre-install and pre-upgrade hooks, but not pre-rollback hook. Thus, when you try to rollback the chart, rbac resources will not be created.

Thanks, we're actually using Terraform's helm provider to manage our helm releases, so our "rollback" is effectively changing helm chart versions and running terraform apply. I'm having trouble figuring out which hooks get triggered and when, I thought terraform apply would always trigger either the upgrade or install hook, but maybe not. Do you have any ideas on how we can rollback spark operator smoothly? Is it possible to create the rbac resources manually?

@colinsteidtmann When you run teffaform apply to "rollback" the cart, the pre-install/pre-upgrade hook will be triggered and the rbac resources will be created. But during the upgrading process, helm will compare the differnece between the two versions, and the rbac resources show up in the newer version but not the older version, then helm will delete them, causing serviceaccount not found error. You can create the rbac resources manually as follows:

# Get hooks manifest
helm get hooks -n spark-operator spark-operator > hooks.yaml

Then edit hooks.yaml and change the namespace of serviceaccount to relase namespace. Then create the hook resources:

kubectl apply -f hooks.yaml

ChenYi015 avatar Jul 04 '24 05:07 ChenYi015