aws-efs-csi-driver
aws-efs-csi-driver copied to clipboard
Logs flooded by Connecting to unix...
/kind bug
What happened? Logs are flooded by: efs-csi-node-k6lkl liveness-probe I1012 10:17:54.811549 1 connection.go:153] Connecting to unix:///csi/csi.sock
What you expected to happen? Don't see this log, at least in log level 1
How to reproduce it (as minimally and precisely as possible)? Just deploy and see the logs
Anything else we need to know?:
Environment
- Kubernetes version (use
kubectl version
): - Driver version: v2.2.0-eks-1-18-2
https://github.com/kubernetes-csi/livenessprobe/issues/110 has the fix, we need to bump the liveliness probe version
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/1054 is fixing the same problem in the EBS driver
For a sizeable cluster, such rapid logging can eat up A LOT of diskspace.
please address this, it's such an easy fix
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
BTW: this can be fixed with overwriting the default values file with:
sidecars:
livenessProbe:
image:
tag: v2.6.0-eks-1-21-13
so https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/595 is essentially too complicated, since it’s also changing the Container Registry.
If the above mentioned solution does not work for you and you are using a k8s version <1.12, then look at the compatible images under: https://gallery.ecr.aws/eks-distro/kubernetes-csi/livenessprobe
If I find the time, I’ll create a PR, to update the livenessProbe
Image to v2.6.
When driver was installed with Helm, I used this command to update image:
kubectl -nkube-system set image deployment/efs-csi-controller liveness-probe=public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.6.0-eks-1-18-17
Your setup may vary -- namespace, deployment name.
EDIT: So after this change it didn't improve. I'm guessing whether the deployment setup is as it should be (I edited bit non-relevant parts):
% k -nkube-system describe deployment/efs-csi-controller
Name: efs-csi-controller
Namespace: kube-system
CreationTimestamp: Wed, 01 Jun 2022 14:14:28 +0300
Labels: app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-efs-csi-driver
Annotations: deployment.kubernetes.io/revision: 4
meta.helm.sh/release-name: aws-efs-csi-driver
meta.helm.sh/release-namespace: kube-system
Selector: app=efs-csi-controller,app.kubernetes.io/instance=aws-efs-csi-driver,app.kubernetes.io/name=aws-efs-csi-driver
Replicas: 2 desired | 1 updated | 3 total | 2 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=efs-csi-controller
app.kubernetes.io/instance=aws-efs-csi-driver
app.kubernetes.io/name=aws-efs-csi-driver
Service Account: efs-csi-controller-sa
Containers:
efs-plugin:
Image: amazon/aws-efs-csi-driver:v1.3.8
Port: 9909/TCP
Host Port: 9909/TCP
Args:
--endpoint=$(CSI_ENDPOINT)
--logtostderr
--v=2
--delete-access-point-root-dir=false
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
Environment:
CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
csi-provisioner:
Image: public.ecr.aws/eks-distro/kubernetes-csi/external-provisioner:v2.1.1-eks-1-18-2
Port: <none>
Host Port: <none>
Args:
--csi-address=$(ADDRESS)
--v=2
--feature-gates=Topology=true
--extra-create-metadata
--leader-election
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
liveness-probe:
Image: public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.6.0-eks-1-18-17
Port: <none>
Host Port: <none>
Args:
--csi-address=/csi/csi.sock
--health-port=9909
Environment: <none>
Mounts:
/csi from socket-dir (rw)
Volumes:
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Priority Class Name: system-cluster-critical
The issue is still in the latest chart. I managed to fix it with downloading the chart and replaced sidecars.livenessProbe.image.tag
sidecars:
livenessProbe:
image:
repository: public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe
tag: v2.6.0-eks-1-18-17 # v2.2.0-eks-1-18-13
Then applied the upgrade:
helm upgrade -f values.yaml --namespace kube-system --set controller.serviceAccount.create=false --set controller.serviceAccount.name=efs-csi-controller-sa aws-efs-csi-driver ../aws-efs-csi-driver-2.2.7.tgz
Probably the parameter can be overwritten by invoking helm instead of values.yaml
change, but I don't use Helm daily, so I didn't bother to look it up.
Please fix the helm chart :)
issue: connection.go:153] Connecting to unix:///csi/csi.sock Can someone explain me what is the root cause of this noisy logs(1 connection.go:153] Connecting to unix:///csi/csi.sock)? Does these relate to errors or cause errors? How did this image version (v2.6.0-eks-1-18-17) solved the problem of logs flooded? Please tell me if anyone knows. Thanks :)
issue: connection.go:153] Connecting to unix:///csi/csi.sock Can someone explain me what is the root cause of this noisy logs(1 connection.go:153] Connecting to unix:///csi/csi.sock)? Does these relate to errors or cause errors? How did this image version (v2.6.0-eks-1-18-17) solved the problem of logs flooded? Please tell me if anyone knows. Thanks :)
https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/564#issuecomment-949517622 has the link to the issue on the liveliness probe, basically noisy logging decision in a sub component (golang connection library).
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
this is fixed in v1.4.9?
Hi @tavin, yes this is fixed in v1.4.9.
/close
@mskanth972: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.