volcano icon indicating copy to clipboard operation
volcano copied to clipboard

deploy webhook by yaml

Open hwdef opened this issue 1 year ago • 8 comments

ref: https://github.com/volcano-sh/volcano/issues/2333

steps:

  • Remove the code that creates the webhook by go
  • Create yaml files for deploying the webhook, set the certificate field blank.
  • Supplementary certificate fields via volcano-admission

a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook.

I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli

hwdef avatar Jul 11 '22 08:07 hwdef

During this period, volcano-scheduler will restart because it cannot connect to the webhook.

Will the scheduler connect to the webhook? l am a little confused here

shinytang6 avatar Jul 21 '22 11:07 shinytang6

During this period, volcano-scheduler will restart because it cannot connect to the webhook.

Will the scheduler connect to the webhook? l am a little confused here

Yes, I got some error log

$ kubectl logs -n volcano-system       volcano-scheduler-5499bd684f-v6c8g  -p 
2022/07/22 06:50:51 maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
W0722 06:50:51.150374       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
panic: failed init default queue, with err: Internal error occurred: failed calling webhook "mutatequeue.volcano.sh": failed to call webhook: Post "https://volcano-admission-service.volcano-system.svc:443/queues/mutate?timeout=10s": dial tcp 10.96.139.202:443: connect: connection refused

goroutine 1 [running]:
volcano.sh/volcano/pkg/scheduler/cache.newSchedulerCache(0x7ffcc03f4b15?, {0x1c1c30d, 0x7}, {0x1c1b6db, 0x7}, {0x0, 0x0, 0x0?})
        /home/edward/code/go/volcano/pkg/scheduler/cache/cache.go:402 +0x20fc
volcano.sh/volcano/pkg/scheduler/cache.New(...)
        /home/edward/code/go/volcano/pkg/scheduler/cache/cache.go:87
volcano.sh/volcano/pkg/scheduler.NewScheduler(0x0?, {0x1c1c30d?, 0x7?}, {0x7ffcc03f4b15, 0x29}, 0x3b9aca00, {0x1c1b6db?, 0x7?}, {0x0, 0x0, ...})
        /home/edward/code/go/volcano/pkg/scheduler/scheduler.go:75 +0xf0
volcano.sh/volcano/cmd/scheduler/app.Run(0xc0001fc2d0)
        /home/edward/code/go/volcano/cmd/scheduler/app/server.go:76 +0x1bf
main.main()
        /home/edward/code/go/volcano/cmd/scheduler/main.go:65 +0x190

hwdef avatar Jul 22 '22 06:07 hwdef

a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook.

I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli

@hwdef Yes, it is a serious problem. Woud you elaborate more on "volcano-scheduler restart because it can not connect to the webhook". scheduler has no direct connection with webhook.

william-wang avatar Jul 28 '22 09:07 william-wang

a little problem: There will be a delay in supplementing the certificate fields through volcano-admission. During this period, volcano-scheduler will restart because it cannot connect to the webhook. I don't know if this is a serious problem, please give me some advice. @Thor-wl @william-wang @shinytang6 @qiankunli

@hwdef Yes, it is a serious problem. Woud you elaborate more on "volcano-scheduler restart because it can not connect to the webhook". scheduler has no direct connection with webhook.

After discussing it with @shinytang6 , we have roughly located the problem. I will try to solve it later.

hwdef avatar Jul 28 '22 09:07 hwdef

The error mentioned above is caused by the failure to call the webhook when creating the initial queue. ~~For this I use the wait package to retry the creation of the initial queue at a frequency of once per second. Until it is created successfully.~~

~~In addition, I also set a timeout of 5 minutes, and when a fatal problem is encountered, the program can be exited by panic.~~ ~~Not sure if such a solution would work.~~ @william-wang @shinytang6 What do you think?

hwdef avatar Aug 02 '22 03:08 hwdef

@shinytang6 I use retry package to create init queue, please review it again. Thanks!

hwdef avatar Aug 04 '22 10:08 hwdef

Another questions: due to we apply all the webhook configurations now, will enabled-admission arguments still work as expected?

After checking the source of this parameter and related code, I think that deploying webhook and this parameter are two separate functions that have no effect on each other

hwdef avatar Aug 05 '22 08:08 hwdef

@shinytang6 @william-wang @Thor-wl I think this PR is ready to review. Please take a look.

hwdef avatar Aug 12 '22 02:08 hwdef

Generally LGTM.

shinytang6 avatar Aug 12 '22 03:08 shinytang6

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Thor-wl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot avatar Aug 12 '22 03:08 volcano-sh-bot