karpenter
karpenter copied to clipboard
chore: migrate knative logger to controller-runtime injected logger
Fixes: https://github.com/kubernetes-sigs/karpenter/issues/922
Description Migrate knative logger to controller-runtime injected logger
After this change the logging messages look like below
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"discovered karpenter version","version":"unspecified"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"webhook disabled"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8000","secure":false}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":1710411206.3906345,"logger":"fallback","caller":"leaderelection/leaderelection.go:250","msg":"attempting to acquire leader lease kube-system/karpenter-leader-election..."}
{"level":"info","ts":1710411221.86726,"logger":"fallback","caller":"leaderelection/leaderelection.go:260","msg":"successfully acquired lease kube-system/karpenter-leader-election"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.provisioner","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption.queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.eviction-queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"node.termination","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"node.termination","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.metrics.node","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.nodeclaim.garbagecollection","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","source":"kind source: *v1.Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"waiting on cluster sync"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"node.termination","controllerGroup":"","controllerKind":"Node","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":1000}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","worker count":10}
How was this change tested?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
/hold
ping @jonathan-innis
Pull Request Test Coverage Report for Build 9165359761
Warning: This coverage report may be inaccurate.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
- For more information on this, see Tracking coverage changes with pull request builds.
- To avoid this issue with future PRs, see these Recommended CI Configurations.
- For a quick fix, rebase this PR at GitHub. Your next report should be accurate.
Details
- 98 of 150 (65.33%) changed or added relevant lines in 46 files are covered.
- 3 unchanged lines in 2 files lost coverage.
- Overall coverage increased (+0.1%) to 78.563%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
kwok/main.go | 0 | 1 | 0.0% |
pkg/controllers/disruption/helpers.go | 3 | 4 | 75.0% |
pkg/controllers/disruption/types.go | 0 | 1 | 0.0% |
pkg/controllers/provisioning/scheduling/scheduler.go | 6 | 7 | 85.71% |
pkg/controllers/disruption/multinodeconsolidation.go | 1 | 3 | 33.33% |
pkg/controllers/disruption/singlenodeconsolidation.go | 1 | 3 | 33.33% |
pkg/controllers/state/cluster.go | 0 | 2 | 0.0% |
pkg/controllers/provisioning/provisioner.go | 11 | 14 | 78.57% |
pkg/scheduling/volumeusage.go | 0 | 3 | 0.0% |
pkg/controllers/disruption/controller.go | 2 | 6 | 33.33% |
<!-- | Total: | 98 | 150 |
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
pkg/operator/logging/logging.go | 1 | 0.0% |
pkg/controllers/provisioning/scheduling/nodeclaim.go | 2 | 89.01% |
<!-- | Total: | 3 |
Totals | |
---|---|
Change from base Build 9164948795: | 0.1% |
Covered Lines: | 8275 |
Relevant Lines: | 10533 |
💛 - Coveralls
@fmuyassarov love this, can you include a run of with the kwok provider scale up and scale down of what these logs look like?
@fmuyassarov love this, can you include a run of with the kwok provider scale up and scale down of what these logs look like?
Absolutely. Will update soon after having results.
@Bryce-Soghigian Can comment more on the Azure provider, but the other thing to keep in mind here is that this change is going to cause the cloud providers to have to shift what logging package it's using as well, so we need to coordinate this change with a change in the cloud provider code so that cloud providers aren't blocked from upgrading to this newer version for too long.
@fmuyassarov Is this also something that you would be interested in taking up?
@Bryce-Soghigian Can comment more on the Azure provider, but the other thing to keep in mind here is that this change is going to cause the cloud providers to have to shift what logging package it's using as well, so we need to coordinate this change with a change in the cloud provider code so that cloud providers aren't blocked from upgrading to this newer version for too long.
@fmuyassarov Is this also something that you would be interested in taking up?
Sure, is there a common mailing list to broadcast the this change to the providers? I would expect community meeting is a good place to start but perhaps there are some other channels which I'm not aware of.
I would expect community meeting is a good place to start but perhaps there are some other channels which I'm not aware of.
Right now we only have two providers so this shouldn't be a huge deal. @Bryce-Soghigian is already replying to this PR so I think that this should be fine. I think bringing this up in WG is still a good callout just to make sure that we do our due diligence when we make this change here.
@fmuyassarov would love to see this get merged. Would you be interested in also migrating the azure provider to use the controller-runtime injected logger?
Would you be interested in also migrating the azure provider to
Sure, why not.
Done and I also prefer ctrl over controllerruntime.
On Thu, Mar 7, 2024 at 8:00 AM Bryce Soghigian @.***> wrote:
@.**** commented on this pull request.
In pkg/operator/controller/controller.go https://github.com/kubernetes-sigs/karpenter/pull/1050#discussion_r1515589537 :
-// insider of the builder. Typed reference implementations, see controllerruntime.Builder +// insider of the builder. Typed reference implementations, see ctrl.Builder type Builder interface { // Complete builds a builder by registering the Reconciler with the manager Complete(Reconciler) error }
-// Adapter adapts a controllerruntime.Builder into the Builder interface +// Adapter adapts a ctrl.Builder into the Builder interface type Adapter struct {
- builder *controllerruntime.Builder
- builder *ctrl.Builder }
-func Adapt(builder *controllerruntime.Builder) Builder { +func Adapt(builder *ctrl.Builder) Builder { return &Adapter{ builder: builder, }
nit(style): seems we use controllerruntime everywhere else. I like ctrl more because its shorter and logging is one of those things you type out a lot. Thoughts on picking either ctrl or controllerruntime to have a uniform reference throughout the codebase?
— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/karpenter/pull/1050#pullrequestreview-1921507237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIRE3PPZGY26PDDRQLUTNMDYW77A3AVCNFSM6AAAAABD52TQP6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMRRGUYDOMRTG4 . You are receiving this because you were mentioned.Message ID: @.***>
rebased
@fmuyassarov love this, can you include a run of with the kwok provider scale up and scale down of what these logs look like?
Hi @Bryce-Soghigian Just had a moment to give it a try, and it appears to be functioning as expected (I think). Here's a quick log snippet after the fix.
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"discovered karpenter version","version":"unspecified"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"webhook disabled"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8000","secure":false}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":1710411206.3906345,"logger":"fallback","caller":"leaderelection/leaderelection.go:250","msg":"attempting to acquire leader lease kube-system/karpenter-leader-election..."}
{"level":"info","ts":1710411221.86726,"logger":"fallback","caller":"leaderelection/leaderelection.go:260","msg":"successfully acquired lease kube-system/karpenter-leader-election"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.provisioner","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption.queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.eviction-queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"node.termination","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"node.termination","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.metrics.node","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.nodeclaim.garbagecollection","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","source":"kind source: *v1.Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"waiting on cluster sync"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"node.termination","controllerGroup":"","controllerKind":"Node","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":1000}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","worker count":10}
/hold cancel
Hey @jonathan-innis . Would you mind to give another look at this PR please?
rebased
Just had a moment to give it a try, and it appears to be functioning as expected (I think). Here's a quick log snippet after the fix
Can you add this to the description?
@jonathan-innis I think there are no open issues left to be addresses currently, apart from what was already discussed here regarding the clarity of instructions for using trace, debug, or info messages. Perhaps we can have some iterations on this task for further improvements. We could consider leaving the tracking issue https://github.com/kubernetes-sigs/karpenter/issues/922 open for a while until we achieve the desired state. if this approach sounds reasonable, can we maybe take this PR in?
Rebased
/assign @jonathan-innis
Thanks for the comments @jonathan-innis . I have updated all the required places and will push it later on today after I get time to test it once more.
@fmuyassarov I think we're getting close 🚀
Changing from a "perf" PR to "chore" since this isn't directly intending to improve our performance, this is effectively a cleanup task that we've needed for quite a while
klog
seems to be using the fallback logger. I think this gets resolved if we take this comment in
{"level":"info","ts":1714372922.0198948,"logger":"fallback","caller":"leaderelection/leaderelection.go:250","msg":"attempting to acquire leader lease kube-system/karpenter-leader-election..."}
@fmuyassarov I was testing out the changes with https://github.com/aws/karpenter-provider-aws/pull/6150. I made a few updates to the Error and Info templating since it looks like we missed a few places where we had templating before, where we now have to call fmt.Sprintf
@fmuyassarov I was testing out the changes with aws/karpenter-provider-aws#6150. I made a few updates to the Error and Info templating since it looks like we missed a few places where we had templating before, where we now have to call
fmt.Sprintf
Thanks @jonathan-innis. It's odd that when I tested aws provider I didn't catch those issues. But I'm +1 for those fixes.
@fmuyassarov I was testing out the changes with https://github.com/aws/karpenter-provider-aws/pull/6150. I made a few updates to the Error and Info templating since it looks like we missed a few places where we had templating before, where we now have to call
fmt.Sprintf
There were a couple of instances where I used fmt.Sprintf
but added it because otherwise logging would fail. Anyways, the second commit looks good to me. I should admit I didn't test it yet. Would you prefer to keep the commits separated or should I squash them into one?
Would you prefer to keep the commits separated or should I squash them into one
Seems fine to keep them separated since we don't rebase merge anyways; we squash merge. I also realized that controller-runtime uses the messaging "Reconciler error" for all of its error messaging. The fact that logr.Logger
basically forces us to provide both an error and a message is highly annoying IMO. Considering controller-runtime basically puts all of its context in the error directly and just gives a top-level "Reconciler error" for the message, I think it makes sense t make this consistent throughout. I updated the locations where we are firing errors to put all of the context into the error message with fmt.Errorf
and then to put something like "Provisioner error" or "Scheduler error" as the message.
One more update: We need to make sure that we are generating the logger from logging.NewLogger()
so that we take in all of the Zap configuration that was passed by the user. Added a commit to ensure that we use the correct logging initialization here. Everything else looks good.
@jonathan-innis what's the plan with this PR?