cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

:seedling: Allows to redefine ETCD client logger

Open dmvolod opened this issue 6 months ago • 9 comments

What this PR does / why we need it: For some reasons, we need to adjust ECTD client logger level to avoid have an unpredictable number of warnings in the log, like

{"level":"warn","ts":"2025-05-22T16:36:08.530926Z","caller":"[email protected]/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc016cf01e0/etcd-test-control-plane-dx4r5","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: error upgrading connection: error sending request: Post \\\"https://XXX.XX.XX.XX:6443/api/v1/namespaces/kube-system/pods/etcd-test-control-plane-dx4r5/portforward?timeout=10s\\\": EOF\""}
{"level":"warn","ts":"2025-05-22T16:36:23.642869Z","caller":"[email protected]/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0041314a0/etcd-test-control-plane-dx4r5","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded"}
{"level":"warn","ts":"2025-05-22T16:37:11.809235Z","caller":"[email protected]/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc01595fa40/etcd-test-control-plane-dx4r5","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: error upgrading connection: error sending request: Post \\\"https://XXX.XX.XX.XX:6443/api/v1/namespaces/kube-system/pods/etcd-test-control-plane-dx4r5/portforward?timeout=10s\\\": EOF\""}
{"level":"warn","ts":"2025-05-22T16:37:30.158754Z","caller":"[email protected]/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc009898f00/etcd-test-control-plane-dx4r5","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded"}

But now, ETCD log level is hardcoded to the zapcore.InfoLevel. This PR allows to redefine it on init in the embedded controllers. If we need to redefine it globally with env variable or run option for any deployment, please let me know.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

dmvolod avatar May 22 '25 17:05 dmvolod

/area provider/control-plane-kubeadm

dmvolod avatar May 22 '25 17:05 dmvolod

Wondering if kubernetes has a similar use case and how are they handling it (e.g. in the kube-apiserver)

sbueringer avatar May 22 '25 19:05 sbueringer

Wondering if kubernetes has a similar use case and how are they handling it (e.g. in the kube-apiserver)

This is a very good point. It doesn't work in kube-apiserver at the moment, because community copied/pasted (https://github.com/kubernetes/kubernetes/blob/b35c5c0a301d326fdfa353943fca077778544ac6/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go#L97-L111) the etcdClientDebugLevel function from etcd, which is no longer actual for now. I will plan to create a similar PR for Kubernetes also.

I requested an export of this function in the PR https://github.com/etcd-io/etcd/pull/20006. But as far as I understand, there are no plans to backport it to etcd 3.5.

The only downside to this approach is that it won't work natively. We can have both approaches at the same time to get a universal solution. That is, set the logger via a function and change the logging level via ETCD_CLIENT_DEBUG variable. The only thing is that at the moment we will also have to copy/paste the function until the release of etcd 3.6 with this fix is ​​released and we will not switch to it. I could create and track this issue.

dmvolod avatar May 22 '25 19:05 dmvolod

I was wondering if there is a way to "reuse" our regular log level for the etcd logger

sbueringer avatar May 23 '25 15:05 sbueringer

I was wondering if there is a way to "reuse" our regular log level for the etcd logger

Seems to with some trick like that, it could be possible, but should be validated

func (r *KubeadmControlPlaneReconciler) SetupWithManager(ctx context.Context, mgr ctrl.Manager, options controller.Options) error {
	logger, _ := logutil.CreateDefaultZapLogger(zapcore.Level(ctrl.LoggerFrom(ctx).GetV()))
	etcd.SetLogger(logger)

But in this case, it would be nice to redefine log destination also in advance to log level. SetLogger will help with it.

dmvolod avatar May 23 '25 18:05 dmvolod

/test pull-cluster-api-test-main

dmvolod avatar Jun 10 '25 07:06 dmvolod

/retest

dmvolod avatar Jun 13 '25 08:06 dmvolod

/test pull-cluster-api-test-main

dmvolod avatar Jun 23 '25 05:06 dmvolod

Thx!

/lgtm /approve

sbueringer avatar Jul 08 '25 13:07 sbueringer

LGTM label has been added.

Git tree hash: c40e44e7fbe2c9de639467e13dbc8a55d59729b8

k8s-ci-robot avatar Jul 08 '25 13:07 k8s-ci-robot

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 08 '25 13:07 k8s-ci-robot

/cherry-pick release-1.10

dmvolod avatar Jul 08 '25 13:07 dmvolod

@dmvolod: new pull request created: #12463

In response to this:

/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

/cherry-pick release-1.9

sbueringer avatar Jul 09 '25 14:07 sbueringer

@sbueringer: #12271 failed to apply on top of branch "release-1.9":

Applying: seedling: Allows to redefine ETCD client logger
Using index info to reconstruct a base tree...
M	controlplane/kubeadm/controllers/alias.go
M	controlplane/kubeadm/internal/cluster.go
M	controlplane/kubeadm/internal/controllers/controller.go
M	controlplane/kubeadm/internal/etcd/etcd.go
M	controlplane/kubeadm/internal/etcd_client_generator.go
M	controlplane/kubeadm/internal/etcd_client_generator_test.go
M	controlplane/kubeadm/main.go
Falling back to patching base and 3-way merge...
Auto-merging controlplane/kubeadm/main.go
CONFLICT (content): Merge conflict in controlplane/kubeadm/main.go
Auto-merging controlplane/kubeadm/internal/etcd_client_generator_test.go
Auto-merging controlplane/kubeadm/internal/etcd_client_generator.go
Auto-merging controlplane/kubeadm/internal/etcd/etcd.go
Auto-merging controlplane/kubeadm/internal/controllers/controller.go
Auto-merging controlplane/kubeadm/internal/cluster.go
Auto-merging controlplane/kubeadm/controllers/alias.go
CONFLICT (content): Merge conflict in controlplane/kubeadm/controllers/alias.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 seedling: Allows to redefine ETCD client logger

In response to this:

/cherry-pick release-1.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.