aws-efs-csi-driver icon indicating copy to clipboard operation
aws-efs-csi-driver copied to clipboard

Logs flooded by Connecting to unix...

Open renanqts opened this issue 3 years ago • 9 comments

/kind bug

What happened? Logs are flooded by: efs-csi-node-k6lkl liveness-probe I1012 10:17:54.811549 1 connection.go:153] Connecting to unix:///csi/csi.sock

What you expected to happen? Don't see this log, at least in log level 1

How to reproduce it (as minimally and precisely as possible)? Just deploy and see the logs

Anything else we need to know?:

Environment

  • Kubernetes version (use kubectl version):
  • Driver version: v2.2.0-eks-1-18-2

renanqts avatar Oct 12 '21 10:10 renanqts

https://github.com/kubernetes-csi/livenessprobe/issues/110 has the fix, we need to bump the liveliness probe version

keifgwinn avatar Oct 22 '21 10:10 keifgwinn

https://github.com/kubernetes-sigs/aws-ebs-csi-driver/pull/1054 is fixing the same problem in the EBS driver

keifgwinn avatar Oct 27 '21 13:10 keifgwinn

For a sizeable cluster, such rapid logging can eat up A LOT of diskspace.

tanvp112 avatar Nov 25 '21 01:11 tanvp112

please address this, it's such an easy fix

keifgwinn avatar Dec 06 '21 12:12 keifgwinn

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 06 '22 12:03 k8s-triage-robot

/remove-lifecycle stale

keifgwinn avatar Mar 07 '22 10:03 keifgwinn

BTW: this can be fixed with overwriting the default values file with:

        sidecars:
          livenessProbe:
            image:
              tag: v2.6.0-eks-1-21-13

so https://github.com/kubernetes-sigs/aws-efs-csi-driver/pull/595 is essentially too complicated, since it’s also changing the Container Registry.

If the above mentioned solution does not work for you and you are using a k8s version <1.12, then look at the compatible images under: https://gallery.ecr.aws/eks-distro/kubernetes-csi/livenessprobe

If I find the time, I’ll create a PR, to update the livenessProbe Image to v2.6.

janaurka avatar May 05 '22 12:05 janaurka

When driver was installed with Helm, I used this command to update image:

kubectl -nkube-system set image deployment/efs-csi-controller liveness-probe=public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.6.0-eks-1-18-17

Your setup may vary -- namespace, deployment name.

EDIT: So after this change it didn't improve. I'm guessing whether the deployment setup is as it should be (I edited bit non-relevant parts):

% k -nkube-system describe deployment/efs-csi-controller
Name:                   efs-csi-controller
Namespace:              kube-system
CreationTimestamp:      Wed, 01 Jun 2022 14:14:28 +0300
Labels:                 app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=aws-efs-csi-driver
Annotations:            deployment.kubernetes.io/revision: 4
                        meta.helm.sh/release-name: aws-efs-csi-driver
                        meta.helm.sh/release-namespace: kube-system
Selector:               app=efs-csi-controller,app.kubernetes.io/instance=aws-efs-csi-driver,app.kubernetes.io/name=aws-efs-csi-driver
Replicas:               2 desired | 1 updated | 3 total | 2 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=efs-csi-controller
                    app.kubernetes.io/instance=aws-efs-csi-driver
                    app.kubernetes.io/name=aws-efs-csi-driver
  Service Account:  efs-csi-controller-sa
  Containers:
   efs-plugin:
    Image:      amazon/aws-efs-csi-driver:v1.3.8
    Port:       9909/TCP
    Host Port:  9909/TCP
    Args:
      --endpoint=$(CSI_ENDPOINT)
      --logtostderr
      --v=2
      --delete-access-point-root-dir=false
    Liveness:  http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:  unix:///var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-provisioner:
    Image:      public.ecr.aws/eks-distro/kubernetes-csi/external-provisioner:v2.1.1-eks-1-18-2
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=2
      --feature-gates=Topology=true
      --extra-create-metadata
      --leader-election
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   liveness-probe:
    Image:      public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.6.0-eks-1-18-17
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=/csi/csi.sock
      --health-port=9909
    Environment:  <none>
    Mounts:
      /csi from socket-dir (rw)
  Volumes:
   socket-dir:
    Type:               EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:             
    SizeLimit:          <unset>
  Priority Class Name:  system-cluster-critical

ilvez avatar Jun 16 '22 08:06 ilvez

The issue is still in the latest chart. I managed to fix it with downloading the chart and replaced sidecars.livenessProbe.image.tag

sidecars:
  livenessProbe:
    image:
      repository: public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe
      tag: v2.6.0-eks-1-18-17 # v2.2.0-eks-1-18-13

Then applied the upgrade:

helm upgrade -f values.yaml --namespace kube-system --set controller.serviceAccount.create=false --set controller.serviceAccount.name=efs-csi-controller-sa aws-efs-csi-driver ../aws-efs-csi-driver-2.2.7.tgz

Probably the parameter can be overwritten by invoking helm instead of values.yaml change, but I don't use Helm daily, so I didn't bother to look it up.

Please fix the helm chart :)

ilvez avatar Aug 23 '22 10:08 ilvez

issue: connection.go:153] Connecting to unix:///csi/csi.sock Can someone explain me what is the root cause of this noisy logs(1 connection.go:153] Connecting to unix:///csi/csi.sock)? Does these relate to errors or cause errors? How did this image version (v2.6.0-eks-1-18-17) solved the problem of logs flooded? Please tell me if anyone knows. Thanks :)

rajudevop avatar Oct 18 '22 21:10 rajudevop

issue: connection.go:153] Connecting to unix:///csi/csi.sock Can someone explain me what is the root cause of this noisy logs(1 connection.go:153] Connecting to unix:///csi/csi.sock)? Does these relate to errors or cause errors? How did this image version (v2.6.0-eks-1-18-17) solved the problem of logs flooded? Please tell me if anyone knows. Thanks :)

https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/564#issuecomment-949517622 has the link to the issue on the liveliness probe, basically noisy logging decision in a sub component (golang connection library).

keifgwinn avatar Oct 19 '22 11:10 keifgwinn

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 17 '23 11:01 k8s-triage-robot

this is fixed in v1.4.9?

tavin avatar Feb 06 '23 15:02 tavin

Hi @tavin, yes this is fixed in v1.4.9.

mskanth972 avatar Feb 22 '23 21:02 mskanth972

/close

mskanth972 avatar Feb 22 '23 21:02 mskanth972

@mskanth972: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 22 '23 21:02 k8s-ci-robot