kiam icon indicating copy to clipboard operation
kiam copied to clipboard

transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Open morgoved opened this issue 5 years ago • 12 comments

kiam not work well... k8s version "v1.14.6" kiam deployed in the external openstack cloud (for get keys in kms from aws) kiam version v3.5

INFO: 2020/05/05 17:53:47 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:53:48 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:53:57 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:53:58 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:54:07 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:54:08 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:54:17 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:54:18 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:54:27 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:54:28 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
WARNING: 2020/05/05 17:54:37 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:443->127.0.0.1:59896: read: connection reset by peer
INFO: 2020/05/05 17:54:37 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/05 17:54:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"

in pod -

wget -qO- http://169.254.169.254

1.0
2007-01-19
2007-03-01
2007-08-29
2007-10-10
2007-12-15
2008-02-01
2008-09-01
2009-04-04

deployed parameters

argocd app create kiam \
 --repo https://uswitch.github.io/kiam-helm-charts/charts/ \
 --helm-chart kiam \
 --revision 5.7.0 \
 --dest-namespace base \
 --dest-server https://10.242.20.10:6443 \
 --helm-set-string "server.extraEnv[0].name=AWS_ACCESS_KEY_ID" \
 --helm-set-string "server.extraEnv[0].value=Axxxxxxxxxx" \
 --helm-set-string "server.extraEnv[1].name=AWS_SECRET_ACCESS_KEY" \
 --helm-set-string "server.extraEnv[1].value=Zzxxxxxxxxxxxxxxxxxxxxxxxxxx" \
 --helm-set-string "server.extraEnv[2].name=GRPC_GO_LOG_SEVERITY_LEVEL" \
 --helm-set-string "server.extraEnv[2].value=info" \
 --helm-set-string "server.extraEnv[3].name=GRPC_GO_LOG_VERBOSITY_LEVEL" \
 --helm-set-string "server.extraEnv[3].value=8" \
 --helm-set-string "extraHostPathMounts[0].name=ssl-certs" \
 --helm-set-string "extraHostPathMounts[0].mountPath=/etc/ssl/certs" \
 --helm-set-string "extraHostPathMounts[0].readOnly=true" \
 --helm-set-string "extraHostPathMounts[0].hostPath=/etc/pki/ca-trust/extracted/pem" \
 -p agent.log.level=debug \
 -p server.log.level=debug \
 -p server.sslCertHostPath=/etc/ssl/certs \
 -p agent.tlsSecret=kiam-agent-certificate-secret \
 -p agent.tlsCerts.caFileName=ca.crt \
 -p agent.tlsCerts.certFileName=tls.crt \
 -p agent.tlsCerts.keyFileName=tls.key \
 -p server.assumeRoleArn=arn:aws:iam::481746587383:role/kiam-server \
 -p server.tlsSecret=kiam-server-certificate-secret \
 -p server.tlsCerts.caFileName=ca.crt \
 -p server.tlsCerts.certFileName=tls.crt \
 -p server.tlsCerts.keyFileName=tls.key \
 -p server.roleBaseArn=arn:aws:iam::481746587383:role/

maybe anyone can help to me?

morgoved avatar May 05 '20 17:05 morgoved

also i created roles by TF

resource "aws_iam_role" "server_role" {
  name        = "kiam-server"
  description = "Role the Kiam Server process assumes"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::481746587383:user/kiam"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
}

resource "aws_iam_policy" "server_policy" {
  name        = "kiam_server_policy"
  description = "Policy for the Kiam Server process"

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "*"
    }
  ]
}
EOF
}

resource "aws_iam_policy_attachment" "server_policy_attach" {
  name       = "kiam-server-attachment"
  roles      = ["${aws_iam_role.server_role.name}"]
  policy_arn = "${aws_iam_policy.server_policy.arn}"
}

morgoved avatar May 06 '20 14:05 morgoved

hm ... i found that wget -qO- http://169.254.169.254 geting info of internal openstack api...

morgoved avatar May 06 '20 15:05 morgoved

UPD i found problem

"error warming credentials: RequestError: send request failed\ncaused by: Post https://sts.amazonaws.com/: x509: certificate signed by unknown authority and solve it changed path

/ # ls /etc/ssl/certs/
README                 email-ca-bundle.pem    objsign-ca-bundle.pem  tls-ca-bundle.pem

and

root@DESKTOP-FFV0RBI:~/rd_argo/argo/proj/base# kubectl exec -it -n base kiam-server-8vzrt /bin/sh
/ # cat /etc/ssl/certs/*  | grep zon
# Amazon Root CA 1
# Amazon Root CA 2
# Amazon Root CA 3
# Amazon Root CA 4
# Amazon Root CA 1
# Amazon Root CA 2
# Amazon Root CA 3
# Amazon Root CA 4

also in servers log i see

{"credentials.access.key":"ASIXXXXXXXXXXXXXXX","credentials.expiration":"2020-05-06T17:01:55Z","credentials.role":"kiam-server","level":"info","msg":"requested new credentials","time":"2020-05-06T16:46:55Z"}
{"credentials.access.key":"ASIXXXXXXXXXXXXXXX","credentials.expiration":"2020-05-06T17:01:55Z","credentials.role":"kiam-server","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"kiam-server","pod.name":"aws-iam-tester-7f74788df5-frpkm","pod.namespace":"default","pod.status.ip":"10.242.32.148","pod.status.phase":"Running","resource.version":"18607042","time":"2020-05-06T16:46:55Z"}

that they can get key

but the problem of

INFO: 2020/05/06 16:51:56 transport: loopyWriter.run returning. connection error: desc = "transport is closing"

and

WARNING: 2020/05/06 16:55:06 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:443->127.0.0.1:53962: read: connection reset by peer

problem is saving

morgoved avatar May 06 '20 15:05 morgoved

I found that https://github.com/uswitch/kiam/issues/385 talked that still need access to host ec2 api.... Does this mean that Kiam can only work inside AWS?

morgoved avatar May 06 '20 21:05 morgoved

Kiam does not need to be ran in AWS What is the actual problem you're seeing? you've posted some info and warning logs, but those don't necessarily indicate any actual problems. Are your Kiam servers running and passing their health checks? Are you seeing any error messages in the Kiam server? Are your Kiam agents running and passing their health checks? Are they reporting any errors in their logs?

Joseph-Irving avatar May 07 '20 07:05 Joseph-Irving

Kiam does not need to be ran in AWS What is the actual problem you're seeing? you've posted some info and warning logs, but those don't necessarily indicate any actual problems. Are your Kiam servers running and passing their health checks? Are you seeing any error messages in the Kiam server? Are your Kiam agents running and passing their health checks? Are they reporting any errors in their logs?

no agents is dead

{"level":"info","msg":"configuring iptables","time":"2020-05-07T11:01:05Z"}
{"level":"info","msg":"started prometheus metric listener 0.0.0.0:9620","time":"2020-05-07T11:01:05Z"}
{"level":"info","msg":"listening :8181","time":"2020-05-07T11:01:05Z"}
{"level":"info","msg":"stopped","time":"2020-05-07T11:01:14Z"}
{"level":"info","msg":"starting server shutdown","time":"2020-05-07T11:01:14Z"}
{"level":"info","msg":"gracefully shutdown server","time":"2020-05-07T11:01:14Z"}

and in servers i have logs

INFO: 2020/05/07 11:05:31 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/07 11:05:36 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
WARNING: 2020/05/07 11:05:41 transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:443->127.0.0.1:48828: read: connection reset by peer
INFO: 2020/05/07 11:05:41 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
{"generation.metadata":0,"level":"debug","msg":"updated pod","pod.iam.role":"","pod.name":"kiam-agent-nfjln","pod.namespace":"base","pod.status.ip":"10.242.20.17","pod.status.phase":"Running","resource.version":"18911803","time":"2020-05-07T11:05:43Z"}
INFO: 2020/05/07 11:05:46 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/07 11:05:51 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/07 11:05:51 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
{"generation.metadata":0,"level":"debug","msg":"updated pod","pod.iam.role":"","pod.name":"kiam-agent-nfjln","pod.namespace":"base","pod.status.ip":"10.242.20.17","pod.status.phase":"Running","resource.version":"18911851","time":"2020-05-07T11:05:52Z"}
INFO: 2020/05/07 11:05:56 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/07 11:06:01 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/07 11:06:03 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
{"generation.metadata":0,"level":"debug","msg":"updated pod","pod.iam.role":"","pod.name":"kiam-agent-nfjln","pod.namespace":"base","pod.status.ip":"10.242.20.17","pod.status.phase":"Running","resource.version":"18911902","time":"2020-05-07T11:06:04Z"}
INFO: 2020/05/07 11:06:06 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/07 11:06:11 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2020/05/07 11:06:16 transport: loopyWriter.run returning. connection error: desc = "transport is closing"

morgoved avatar May 07 '20 11:05 morgoved

Something is killing your Kiam agents, as they're shutting down. Are the failing their liveness probe? If so you should look into why that's happening

Joseph-Irving avatar May 07 '20 11:05 Joseph-Irving

Something is killing your Kiam agents, as they're shutting down. Are the failing their liveness probe? If so you should look into why that's happening

how i can do it?) log level is debug

morgoved avatar May 07 '20 18:05 morgoved

kubectl describe pod pod-name

Joseph-Irving avatar May 11 '20 07:05 Joseph-Irving

kubectl describe pod pod-name

Name:           kiam-agent-nfjln
Namespace:      base
Node:           mom-gatekeeper-argo-0-default-group-0/10.242.20.17
Start Time:     Thu, 07 May 2020 00:33:25 +0300
Labels:         app=kiam
                component=agent
                controller-revision-hash=66bb99d55
                pod-template-generation=1
                release=kiam
Annotations:    <none>
Status:         Running
IP:             10.242.20.17
IPs:            <none>
Controlled By:  DaemonSet/kiam-agent
Containers:
  kiam-agent:
    Container ID:  docker://e3dc53b60fc1ad1e579d7e909e0f7fa44467e4f60e037b7795c7bae7b6b615f5
    Image:         quay.io/uswitch/kiam:v3.5
    Image ID:      docker-pullable://quay.io/uswitch/kiam@sha256:923020c93162636af89a54f4e96e062341c6ef87b85a6567d1cf0edb7fbff33c
    Port:          <none>
    Host Port:     <none>
    Command:
      /kiam
      agent
    Args:
      --iptables
      --no-iptables-remove
      --host-interface=cali+
      --json-log
      --level=debug
      --port=8181
      --cert=/etc/kiam/tls/tls.crt
      --key=/etc/kiam/tls/tls.key
      --ca=/etc/kiam/tls/ca.crt
      --server-address=kiam-server:443
      --prometheus-listen-addr=0.0.0.0:9620
      --prometheus-sync-interval=5s
      --gateway-timeout-creation=1s
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 11 May 2020 21:45:43 +0300
      Finished:     Mon, 11 May 2020 21:45:54 +0300
    Ready:          False
    Restart Count:  2574
    Liveness:       http-get http://:8181/ping delay=3s timeout=1s period=3s #success=1 #failure=3
    Environment:
      HOST_IP:   (v1:status.podIP)
    Mounts:
      /etc/kiam/tls from tls (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kiam-agent-token-nwjms (ro)
      /var/run/xtables.lock from xtables (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kiam-agent-certificate-secret
    Optional:    false
  xtables:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  kiam-agent-token-nwjms:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kiam-agent-token-nwjms
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/pid-pressure:NoSchedule
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason     Age                      From                                            Message
  ----     ------     ----                     ----                                            -------
  Warning  Unhealthy  25m (x7693 over 4d21h)   kubelet, mom-gatekeeper-argo-0-default-group-0  Liveness probe failed: HTTP probe failed with statuscode: 404
  Warning  BackOff    49s (x31472 over 4d21h)  kubelet, mom-gatekeeper-argo-0-default-group-0  Back-off restarting failed container

morgoved avatar May 11 '20 18:05 morgoved

i updated node selector and toleration for set kiam-server executing on masters nodes and agents on others nodes. But i have these errors anyway.

morgoved avatar May 11 '20 23:05 morgoved

You're getting a Liveness probe failed: HTTP probe failed with statuscode: 404 which is rather odd, do you have something else on those nodes that are listening on port 8181?

Joseph-Irving avatar May 12 '20 07:05 Joseph-Irving