karpenter icon indicating copy to clipboard operation
karpenter copied to clipboard

chore: migrate knative logger to controller-runtime injected logger

Open fmuyassarov opened this issue 11 months ago • 30 comments

Fixes: https://github.com/kubernetes-sigs/karpenter/issues/922

Description Migrate knative logger to controller-runtime injected logger

After this change the logging messages look like below

{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"discovered karpenter version","version":"unspecified"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"webhook disabled"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8000","secure":false}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":1710411206.3906345,"logger":"fallback","caller":"leaderelection/leaderelection.go:250","msg":"attempting to acquire leader lease kube-system/karpenter-leader-election..."}
{"level":"info","ts":1710411221.86726,"logger":"fallback","caller":"leaderelection/leaderelection.go:260","msg":"successfully acquired lease kube-system/karpenter-leader-election"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.provisioner","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption.queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.eviction-queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"node.termination","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"node.termination","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.metrics.node","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.nodeclaim.garbagecollection","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","source":"kind source: *v1.Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"waiting on cluster sync"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"node.termination","controllerGroup":"","controllerKind":"Node","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":1000}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","worker count":10}

How was this change tested?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

fmuyassarov avatar Feb 28 '24 11:02 fmuyassarov

/hold

fmuyassarov avatar Feb 28 '24 11:02 fmuyassarov

ping @jonathan-innis

fmuyassarov avatar Feb 28 '24 11:02 fmuyassarov

Pull Request Test Coverage Report for Build 9165359761

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 98 of 150 (65.33%) changed or added relevant lines in 46 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.1%) to 78.563%

Changes Missing Coverage Covered Lines Changed/Added Lines %
kwok/main.go 0 1 0.0%
pkg/controllers/disruption/helpers.go 3 4 75.0%
pkg/controllers/disruption/types.go 0 1 0.0%
pkg/controllers/provisioning/scheduling/scheduler.go 6 7 85.71%
pkg/controllers/disruption/multinodeconsolidation.go 1 3 33.33%
pkg/controllers/disruption/singlenodeconsolidation.go 1 3 33.33%
pkg/controllers/state/cluster.go 0 2 0.0%
pkg/controllers/provisioning/provisioner.go 11 14 78.57%
pkg/scheduling/volumeusage.go 0 3 0.0%
pkg/controllers/disruption/controller.go 2 6 33.33%
<!-- Total: 98 150
Files with Coverage Reduction New Missed Lines %
pkg/operator/logging/logging.go 1 0.0%
pkg/controllers/provisioning/scheduling/nodeclaim.go 2 89.01%
<!-- Total: 3
Totals Coverage Status
Change from base Build 9164948795: 0.1%
Covered Lines: 8275
Relevant Lines: 10533

💛 - Coveralls

coveralls avatar Feb 28 '24 14:02 coveralls

@fmuyassarov love this, can you include a run of with the kwok provider scale up and scale down of what these logs look like?

Bryce-Soghigian avatar Feb 28 '24 19:02 Bryce-Soghigian

@fmuyassarov love this, can you include a run of with the kwok provider scale up and scale down of what these logs look like?

Absolutely. Will update soon after having results.

fmuyassarov avatar Feb 28 '24 22:02 fmuyassarov

@Bryce-Soghigian Can comment more on the Azure provider, but the other thing to keep in mind here is that this change is going to cause the cloud providers to have to shift what logging package it's using as well, so we need to coordinate this change with a change in the cloud provider code so that cloud providers aren't blocked from upgrading to this newer version for too long.

@fmuyassarov Is this also something that you would be interested in taking up?

jonathan-innis avatar Mar 02 '24 23:03 jonathan-innis

@Bryce-Soghigian Can comment more on the Azure provider, but the other thing to keep in mind here is that this change is going to cause the cloud providers to have to shift what logging package it's using as well, so we need to coordinate this change with a change in the cloud provider code so that cloud providers aren't blocked from upgrading to this newer version for too long.

@fmuyassarov Is this also something that you would be interested in taking up?

Sure, is there a common mailing list to broadcast the this change to the providers? I would expect community meeting is a good place to start but perhaps there are some other channels which I'm not aware of.

fmuyassarov avatar Mar 04 '24 20:03 fmuyassarov

I would expect community meeting is a good place to start but perhaps there are some other channels which I'm not aware of.

Right now we only have two providers so this shouldn't be a huge deal. @Bryce-Soghigian is already replying to this PR so I think that this should be fine. I think bringing this up in WG is still a good callout just to make sure that we do our due diligence when we make this change here.

jonathan-innis avatar Mar 06 '24 04:03 jonathan-innis

@fmuyassarov would love to see this get merged. Would you be interested in also migrating the azure provider to use the controller-runtime injected logger?

Bryce-Soghigian avatar Mar 06 '24 09:03 Bryce-Soghigian

Would you be interested in also migrating the azure provider to

Sure, why not.

fmuyassarov avatar Mar 06 '24 11:03 fmuyassarov

Done and I also prefer ctrl over controllerruntime.

On Thu, Mar 7, 2024 at 8:00 AM Bryce Soghigian @.***> wrote:

@.**** commented on this pull request.

In pkg/operator/controller/controller.go https://github.com/kubernetes-sigs/karpenter/pull/1050#discussion_r1515589537 :

-// insider of the builder. Typed reference implementations, see controllerruntime.Builder +// insider of the builder. Typed reference implementations, see ctrl.Builder type Builder interface { // Complete builds a builder by registering the Reconciler with the manager Complete(Reconciler) error }

-// Adapter adapts a controllerruntime.Builder into the Builder interface +// Adapter adapts a ctrl.Builder into the Builder interface type Adapter struct {

  • builder *controllerruntime.Builder
  • builder *ctrl.Builder }

-func Adapt(builder *controllerruntime.Builder) Builder { +func Adapt(builder *ctrl.Builder) Builder { return &Adapter{ builder: builder, }

nit(style): seems we use controllerruntime everywhere else. I like ctrl more because its shorter and logging is one of those things you type out a lot. Thoughts on picking either ctrl or controllerruntime to have a uniform reference throughout the codebase?

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/karpenter/pull/1050#pullrequestreview-1921507237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIRE3PPZGY26PDDRQLUTNMDYW77A3AVCNFSM6AAAAABD52TQP6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMRRGUYDOMRTG4 . You are receiving this because you were mentioned.Message ID: @.***>

fmuyassarov avatar Mar 07 '24 11:03 fmuyassarov

rebased

fmuyassarov avatar Mar 12 '24 08:03 fmuyassarov

@fmuyassarov love this, can you include a run of with the kwok provider scale up and scale down of what these logs look like?

Hi @Bryce-Soghigian Just had a moment to give it a try, and it appears to be functioning as expected (I think). Here's a quick log snippet after the fix.

{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"discovered karpenter version","version":"unspecified"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"webhook disabled"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8000","secure":false}
{"level":"info","ts":"2024-03-14T10:13:26Z","logger":"controller","msg":"starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":1710411206.3906345,"logger":"fallback","caller":"leaderelection/leaderelection.go:250","msg":"attempting to acquire leader lease kube-system/karpenter-leader-election..."}
{"level":"info","ts":1710411221.86726,"logger":"fallback","caller":"leaderelection/leaderelection.go:260","msg":"successfully acquired lease kube-system/karpenter-leader-election"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.provisioner","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption.queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.eviction-queue","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.node","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.node","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"node.termination","controllerGroup":"","controllerKind":"Node","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"node.termination","controllerGroup":"","controllerKind":"Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.metrics.node","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Node"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.nodeclaim.garbagecollection","msg":"starting controller"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","source":"kind source: *v1.Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1beta1.NodePool"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting EventSource","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting Controller","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller.disruption","msg":"waiting on cluster sync"}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.hash","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"provisioner.trigger.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.node","controllerGroup":"","controllerKind":"Node","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.pod","controllerGroup":"","controllerKind":"Pod","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"node.termination","controllerGroup":"","controllerKind":"Node","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"state.nodeclaim","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.nodepool","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"metrics.pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodepool.counter","controllerGroup":"karpenter.sh","controllerKind":"NodePool","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":1000}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.consistency","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.termination","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":100}
{"level":"info","ts":"2024-03-14T10:13:41Z","logger":"controller","msg":"Starting workers","controller":"nodeclaim.disruption","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"lease.garbagecollection","controllerGroup":"coordination.k8s.io","controllerKind":"Lease","worker count":10}
{"level":"info","ts":"2024-03-14T10:13:42Z","logger":"controller","msg":"Starting workers","controller":"state.daemonset","controllerGroup":"apps","controllerKind":"DaemonSet","worker count":10}

fmuyassarov avatar Mar 14 '24 10:03 fmuyassarov

/hold cancel

fmuyassarov avatar Mar 14 '24 10:03 fmuyassarov

Hey @jonathan-innis . Would you mind to give another look at this PR please?

fmuyassarov avatar Mar 15 '24 20:03 fmuyassarov

rebased

fmuyassarov avatar Mar 19 '24 08:03 fmuyassarov

Just had a moment to give it a try, and it appears to be functioning as expected (I think). Here's a quick log snippet after the fix

Can you add this to the description?

jonathan-innis avatar Mar 19 '24 10:03 jonathan-innis

@jonathan-innis I think there are no open issues left to be addresses currently, apart from what was already discussed here regarding the clarity of instructions for using trace, debug, or info messages. Perhaps we can have some iterations on this task for further improvements. We could consider leaving the tracking issue https://github.com/kubernetes-sigs/karpenter/issues/922 open for a while until we achieve the desired state. if this approach sounds reasonable, can we maybe take this PR in?

fmuyassarov avatar Mar 21 '24 02:03 fmuyassarov

Rebased

fmuyassarov avatar Apr 02 '24 08:04 fmuyassarov

/assign @jonathan-innis

jonathan-innis avatar Apr 16 '24 18:04 jonathan-innis

Thanks for the comments @jonathan-innis . I have updated all the required places and will push it later on today after I get time to test it once more.

fmuyassarov avatar Apr 16 '24 18:04 fmuyassarov

@fmuyassarov I think we're getting close 🚀

jonathan-innis avatar Apr 22 '24 07:04 jonathan-innis

Changing from a "perf" PR to "chore" since this isn't directly intending to improve our performance, this is effectively a cleanup task that we've needed for quite a while

jonathan-innis avatar Apr 22 '24 07:04 jonathan-innis

klog seems to be using the fallback logger. I think this gets resolved if we take this comment in

{"level":"info","ts":1714372922.0198948,"logger":"fallback","caller":"leaderelection/leaderelection.go:250","msg":"attempting to acquire leader lease kube-system/karpenter-leader-election..."}

jonathan-innis avatar Apr 29 '24 06:04 jonathan-innis

@fmuyassarov I was testing out the changes with https://github.com/aws/karpenter-provider-aws/pull/6150. I made a few updates to the Error and Info templating since it looks like we missed a few places where we had templating before, where we now have to call fmt.Sprintf

jonathan-innis avatar May 05 '24 06:05 jonathan-innis

@fmuyassarov I was testing out the changes with aws/karpenter-provider-aws#6150. I made a few updates to the Error and Info templating since it looks like we missed a few places where we had templating before, where we now have to call fmt.Sprintf

Thanks @jonathan-innis. It's odd that when I tested aws provider I didn't catch those issues. But I'm +1 for those fixes.

fmuyassarov avatar May 05 '24 06:05 fmuyassarov

@fmuyassarov I was testing out the changes with https://github.com/aws/karpenter-provider-aws/pull/6150. I made a few updates to the Error and Info templating since it looks like we missed a few places where we had templating before, where we now have to call fmt.Sprintf

There were a couple of instances where I used fmt.Sprintf but added it because otherwise logging would fail. Anyways, the second commit looks good to me. I should admit I didn't test it yet. Would you prefer to keep the commits separated or should I squash them into one?

fmuyassarov avatar May 05 '24 06:05 fmuyassarov

Would you prefer to keep the commits separated or should I squash them into one

Seems fine to keep them separated since we don't rebase merge anyways; we squash merge. I also realized that controller-runtime uses the messaging "Reconciler error" for all of its error messaging. The fact that logr.Logger basically forces us to provide both an error and a message is highly annoying IMO. Considering controller-runtime basically puts all of its context in the error directly and just gives a top-level "Reconciler error" for the message, I think it makes sense t make this consistent throughout. I updated the locations where we are firing errors to put all of the context into the error message with fmt.Errorf and then to put something like "Provisioner error" or "Scheduler error" as the message.

jonathan-innis avatar May 05 '24 06:05 jonathan-innis

One more update: We need to make sure that we are generating the logger from logging.NewLogger() so that we take in all of the Zap configuration that was passed by the user. Added a commit to ensure that we use the correct logging initialization here. Everything else looks good.

jonathan-innis avatar May 05 '24 22:05 jonathan-innis

@jonathan-innis what's the plan with this PR?

fmuyassarov avatar May 14 '24 19:05 fmuyassarov