volcano
volcano copied to clipboard
Add doc to note how to cleanup `Volcano` completely
What would you like to be added:
Add doc to note how to cleanup Volcano completely
I0312 06:48:38.683192 1 event.go:291] "Event occurred" object="volcano-system/volcano-admission-54b4798bff" kind="ReplicaSet" apiVersion="apps/v1" type="Warning" reason="FailedCreate" message="Error creating: Internal error occurred: failed calling webhook \"mutatepod.volcano.sh\": failed to call webhook: Post \"https://volcano-admission-service.volcano-system.svc:443/pods/mutate?timeout=10s\": dial tcp xxx.xxx.xxx.xxx:443: connect: connection refused"
Why is this needed:
kubectl apply -f ./installer/volcano-development-arm64.yaml
kubectl delete -f ./installer/volcano-development-arm64.yaml
Use apply and delete reinstall volcano, we still need to cleanup validatingwebhookconfigurations and mutatingwebhookconfigurations, then re-apply:
k delete validatingwebhookconfigurations volcano-admission-service-jobs-validate volcano-admission-service-pods-validate volcano-admission-service-queues-validate
k delete mutatingwebhookconfigurations volcano-admission-service-jobs-mutate volcano-admission-service-podgroups-mutate volcano-admission-service-pods-mutate volcano-admission-service-queues-mutate
https://github.com/volcano-sh/volcano/issues/2102#issuecomment-1072295632: Supplyment: volcano-admission-init will create a secret which cannot be cleaned thoroughly.
For some reason I still see some events like:
Error creating: Internal error occurred: failed calling webhook "mutatepod.volcano.sh": Post "https://volcano-admission-service.volcano-system.svc:443/pods/mutate?timeout=10s": x509: certificate has expired or is not yet valid: current time 2022-03-19T07:53:35Z is after 2021-07-03T10:00:41Z
Even after deletion of volcano and validatingwebhookconfigurations , mutatingwebhookconfigurations
Here are some improvements I need about webhook:
- Instead of using code to generate webhook, it is written in yaml and deployed with yaml.This makes it easy to change the scope of application of the webhook and to ignore some pods
- If possible, narrow the scope of application of webhook, currently intercepting all pods, once volcano has a problem, the entire cluster cannot create pods
Here are some improvements I need about webhook:
- Instead of using code to generate webhook, it is written in yaml and deployed with yaml.This makes it easy to change the scope of application of the webhook and to ignore some pods
- If possible, narrow the scope of application of webhook, currently intercepting all pods, once volcano has a problem, the entire cluster cannot create pods
agree with you...
For the first you mention: maybe the cert generate is difficult to generate in yaml. i understand why we use job generate now, it easy for user to install, but not considered how graceful clean them , my suggestion is make a separated job task to them as the installation does...
For the second, i think the it maybe influence the whole design because of pod resource's usage caculation i think ..not a very simple problem....(i'm not very very familiar with it now...) All in all, it's the volcano HA can avoid this.......
I think the simplest solution for now is to add a delete hook(Job) to delete these webhooks individually :)
https://kubernetes.io/zh/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-namespaceselector
We should exclude pods of kube-system and volcano-system
For future users with this issue, I resolved it with:
kubectl delete validatingwebhookconfigurations volcano-admission-service-pods-validate volcano-admission-service-jobs-validate volcano-admission-service-queues-validate
kubectl delete mutatingwebhookconfigurations volcano-admission-service-podgroups-mutate volcano-admission-service-jobs-mutate volcano-admission-service-pods-mutate volcano-admission-service-queues-mutate
For future users with this issue, I resolved it with:
kubectl delete validatingwebhookconfigurations volcano-admission-service-pods-validate volcano-admission-service-jobs-validate volcano-admission-service-queues-validate kubectl delete mutatingwebhookconfigurations volcano-admission-service-podgroups-mutate volcano-admission-service-jobs-mutate volcano-admission-service-pods-mutate volcano-admission-service-queues-mutate
I have deleted validatingwebhookconfigurations and mutatingwebhookconfigurations as you mentioned, but when i run "kubectl apply -f mpi-example.yaml", it only outputs "job.batch.volcano.sh/lm-mpi-job configured", run "kubectl get pod", it outputs "No resources found in default namespace". How to solve this problem?
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
fix in https://github.com/volcano-sh/volcano/pull/2346 /close
@hwdef: Closing this issue.
In response to this:
fix in https://github.com/volcano-sh/volcano/pull/2346 /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.