volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Can not create a job

Open kongjibai opened this issue 2 years ago • 6 comments

What happened: using kubectl apply -f lm-horovod-tf-mnist-v0.5.yaml to create a horovod tf job, it outputs job.batch.volcano.sh/lm-horovod-job created or job.batch.volcano.sh/lm-horovod-job configed, but kubectl get pod outputs No resources found in default namespace.

What you expected to happen: kubectl get pod outputs the started master & worker job and the status of them

How to reproduce it (as minimally and precisely as possible): It was fine a few days ago. At the bigining when using kubectl apply -f lm-horovod-tf-mnist-v0.5.yaml, it output Error from server (InternalError): error when creating "hvd-job-tf-mnist.yaml": Internal error occurred: failed calling webhook "mutatejob.volcano.sh": Post "https://volcano-admission-service.volcano-system.svc:443/jobs/mutate?timeout=10s": dial tcp 10.86.213.31:443: connect: connection refused I delete validatingwebhookconfigurations and mutatingwebhookconfigurations as issues 2079. Now it not works as before.

Anything else we need to know?:

Environment:

  • Volcano Version: v1.5.1
  • Kubernetes version (use kubectl version): v1.21.3
  • Cloud provider or hardware configuration: local K8s cluster
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.6.1810 (Core)
  • Kernel (e.g. uname -a): x86_64 GNU/Linux
  • Install tools: kubectl apply -f volcano-development.yaml
  • Others:

kongjibai avatar May 06 '22 01:05 kongjibai

You may need to reinstall the volcano again after deleting these webhook configurations.

shinytang6 avatar May 06 '22 02:05 shinytang6

You may need to reinstall the volcano again after deleting these webhook configurations.

thx, i have uninstall it using kubectl delete -f volcano-development.yaml, and reinstalled. but maybe uninstall completely, because the reinstall command finished within 1 second. i want to know how to clean up volcano completely?

kongjibai avatar May 06 '22 02:05 kongjibai

You may need to reinstall the volcano again after deleting these webhook configurations.

thx, i have uninstall it using kubectl delete -f volcano-development.yaml, and reinstalled. but maybe uninstall completely, because the reinstall command finished within 1 second. i want to know how to clean up volcano completely?

if you use volcano-development.yaml, the normal reinstall way might be:

$ kubectl delete -f volcano-development.yaml
$ kubectl delete validatingwebhookconfigurations volcano-admission-service-pods-validate volcano-admission-service-jobs-validate volcano-admission-service-queues-validate
$ kubectl delete mutatingwebhookconfigurations volcano-admission-service-podgroups-mutate volcano-admission-service-jobs-mutate volcano-admission-service-pods-mutate volcano-admission-service-queues-mutate
$ kubectl apply -f volcano-development.yaml

shinytang6 avatar May 06 '22 02:05 shinytang6

if you use volcano-development.yaml, the normal reinstall way might be:

$ kubectl delete -f volcano-development.yaml
$ kubectl delete validatingwebhookconfigurations volcano-admission-service-pods-validate volcano-admission-service-jobs-validate volcano-admission-service-queues-validate
$ kubectl delete mutatingwebhookconfigurations volcano-admission-service-podgroups-mutate volcano-admission-service-jobs-mutate volcano-admission-service-pods-mutate volcano-admission-service-queues-mutate
$ kubectl apply -f volcano-development.yaml

thx, i will try as you suggested.

kongjibai avatar May 06 '22 06:05 kongjibai

if you use volcano-development.yaml, the normal reinstall way might be:

$ kubectl delete -f volcano-development.yaml
$ kubectl delete validatingwebhookconfigurations volcano-admission-service-pods-validate volcano-admission-service-jobs-validate volcano-admission-service-queues-validate
$ kubectl delete mutatingwebhookconfigurations volcano-admission-service-podgroups-mutate volcano-admission-service-jobs-mutate volcano-admission-service-pods-mutate volcano-admission-service-queues-mutate
$ kubectl apply -f volcano-development.yaml

Yes, this way can clean up volcano, but how to solve the problem No resources found in default namespace when input kubectl get pod. In fact, I have applied the job by 'kubectl apply -f ****.yaml'. Because this problem always appear, is it a Bug?

kongjibai avatar May 26 '22 10:05 kongjibai

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Sep 08 '22 22:09 stale[bot]