volcano
volcano copied to clipboard
deploy webhook by yaml
ref: https://github.com/volcano-sh/volcano/issues/2333
steps:
- Remove the code that creates the webhook by go
- Create yaml files for deploying the webhook, set the certificate field blank.
- Supplementary certificate fields via volcano-admission
a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook.
I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli
During this period, volcano-scheduler will restart because it cannot connect to the webhook.
Will the scheduler connect to the webhook? l am a little confused here
During this period, volcano-scheduler will restart because it cannot connect to the webhook.
Will the scheduler connect to the webhook? l am a little confused here
Yes, I got some error log
$ kubectl logs -n volcano-system volcano-scheduler-5499bd684f-v6c8g -p
2022/07/22 06:50:51 maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
W0722 06:50:51.150374 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
panic: failed init default queue, with err: Internal error occurred: failed calling webhook "mutatequeue.volcano.sh": failed to call webhook: Post "https://volcano-admission-service.volcano-system.svc:443/queues/mutate?timeout=10s": dial tcp 10.96.139.202:443: connect: connection refused
goroutine 1 [running]:
volcano.sh/volcano/pkg/scheduler/cache.newSchedulerCache(0x7ffcc03f4b15?, {0x1c1c30d, 0x7}, {0x1c1b6db, 0x7}, {0x0, 0x0, 0x0?})
/home/edward/code/go/volcano/pkg/scheduler/cache/cache.go:402 +0x20fc
volcano.sh/volcano/pkg/scheduler/cache.New(...)
/home/edward/code/go/volcano/pkg/scheduler/cache/cache.go:87
volcano.sh/volcano/pkg/scheduler.NewScheduler(0x0?, {0x1c1c30d?, 0x7?}, {0x7ffcc03f4b15, 0x29}, 0x3b9aca00, {0x1c1b6db?, 0x7?}, {0x0, 0x0, ...})
/home/edward/code/go/volcano/pkg/scheduler/scheduler.go:75 +0xf0
volcano.sh/volcano/cmd/scheduler/app.Run(0xc0001fc2d0)
/home/edward/code/go/volcano/cmd/scheduler/app/server.go:76 +0x1bf
main.main()
/home/edward/code/go/volcano/cmd/scheduler/main.go:65 +0x190
a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook.
I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli
@hwdef Yes, it is a serious problem. Woud you elaborate more on "volcano-scheduler restart because it can not connect to the webhook". scheduler has no direct connection with webhook.
a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook. I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli
@hwdef Yes, it is a serious problem. Woud you elaborate more on "volcano-scheduler restart because it can not connect to the webhook". scheduler has no direct connection with webhook.
After discussing it with @shinytang6 , we have roughly located the problem. I will try to solve it later.
The error mentioned above is caused by the failure to call the webhook when creating the initial queue. ~~For this I use the wait package to retry the creation of the initial queue at a frequency of once per second. Until it is created successfully.~~
~~In addition, I also set a timeout of 5 minutes, and when a fatal problem is encountered, the program can be exited by panic.~~ ~~Not sure if such a solution would work.~~ @william-wang @shinytang6 What do you think?
@shinytang6
I use retry
package to create init queue, please review it again.
Thanks!
Another questions: due to we apply all the webhook configurations now, will
enabled-admission
arguments still work as expected?
After checking the source of this parameter and related code, I think that deploying webhook and this parameter are two separate functions that have no effect on each other
@shinytang6 @william-wang @Thor-wl I think this PR is ready to review. Please take a look.
Generally LGTM.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Thor-wl
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [Thor-wl]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment