fabric8 icon indicating copy to clipboard operation
fabric8 copied to clipboard

fluentd pod can't start - Invalid Kubernetes API endpoint: SSL_connect returned=1

Open cmoulliard opened this issue 8 years ago • 15 comments

The docker container "fabric8/fluentd-kubernetes:v1.10" which is created from the pod "fluentd-elasticsearch-172.28.128.4" can't start and reports this error

2016-03-10 15:40:33 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2016-03-10 15:40:35 +0000 [error]: config error file="/etc/fluent/fluent.conf" error="Invalid Kubernetes API endpoint: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed"
2016-03-10 15:40:35 +0000 [info]: process finished code=256
2016-03-10 15:40:35 +0000 [error]: fluentd main process died unexpectedly. restarting.
2016-03-10 15:40:35 +0000 [info]: starting fluentd-0.14.0.pre.1
2016-03-10 15:40:36 +0000 [info]: gem 'fluent-plugin-docker_metadata_filter' version '0.1.1'
2016-03-10 15:40:36 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.3.0'
2016-03-10 15:40:36 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.15.0'
2016-03-10 15:40:36 +0000 [info]: gem 'fluentd' version '0.14.0.pre.1'
2016-03-10 15:40:36 +0000 [info]: gem 'fluentd' version '0.12.20'
2016-03-10 15:40:36 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2016-03-10 15:40:38 +0000 [error]: config error file="/etc/fluent/fluent.conf" error="Invalid Kubernetes API endpoint: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed"
2016-03-10 15:40:38 +0000 [info]: process finished code=256

Fluentd config of the pod

[root@fluentd-elasticsearch-172 /]# more /etc/fluent/fluent.conf
<source>
  type tail
  path /var/log/containers/*.log
  pos_file /var/log/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%N
  tag kubernetes.*
  format json
  read_from_head true
  keep_time_key true
</source>

<filter kubernetes.**>
  type kubernetes_metadata
</filter>

<match **>
  type elasticsearch
  log_level info
  include_tag_key true
  time_key time
  host elasticsearch
  port 9200
  scheme http


  buffer_type memory

  buffer_chunk_limit 8m
  buffer_queue_limit 8192
  flush_interval 10s
  retry_limit 10
  disable_retry_limit
  retry_wait 1s
  max_retry_wait 60s
  num_threads 1
  logstash_format true
</match>

Vars

declare -x DOCKER_REGISTRY_PORT="tcp://172.30.11.237:5000"
declare -x DOCKER_REGISTRY_PORT_5000_TCP="tcp://172.30.11.237:5000"
declare -x DOCKER_REGISTRY_PORT_5000_TCP_ADDR="172.30.11.237"
declare -x DOCKER_REGISTRY_PORT_5000_TCP_PORT="5000"
declare -x DOCKER_REGISTRY_PORT_5000_TCP_PROTO="tcp"
declare -x DOCKER_REGISTRY_SERVICE_HOST="172.30.11.237"
declare -x DOCKER_REGISTRY_SERVICE_PORT="5000"
declare -x DOCKER_REGISTRY_SERVICE_PORT_5000_TCP="5000"
declare -x ELASTICSEARCH_HOST="elasticsearch"
declare -x ELASTICSEARCH_PORT="9200"
declare -x ELASTICSEARCH_PORT_9200_TCP="tcp://172.30.135.114:9200"
declare -x ELASTICSEARCH_PORT_9200_TCP_ADDR="172.30.135.114"
declare -x ELASTICSEARCH_PORT_9200_TCP_PORT="9200"
declare -x ELASTICSEARCH_PORT_9200_TCP_PROTO="tcp"
declare -x ELASTICSEARCH_SERVICE_HOST="172.30.135.114"
declare -x ELASTICSEARCH_SERVICE_PORT="9200"
declare -x FABRIC8_DOCKER_REGISTRY_PORT="tcp://172.30.235.198:80"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP="tcp://172.30.235.198:80"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP_ADDR="172.30.235.198"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP_PORT="80"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP_PROTO="tcp"
declare -x FABRIC8_DOCKER_REGISTRY_SERVICE_HOST="172.30.235.198"
declare -x FABRIC8_DOCKER_REGISTRY_SERVICE_PORT="80"
declare -x FABRIC8_FORGE_PORT="tcp://172.30.61.51:80"
declare -x FABRIC8_FORGE_PORT_80_TCP="tcp://172.30.61.51:80"
declare -x FABRIC8_FORGE_PORT_80_TCP_ADDR="172.30.61.51"
declare -x FABRIC8_FORGE_PORT_80_TCP_PORT="80"
declare -x FABRIC8_FORGE_PORT_80_TCP_PROTO="tcp"
declare -x FABRIC8_FORGE_SERVICE_HOST="172.30.61.51"
declare -x FABRIC8_FORGE_SERVICE_PORT="80"
declare -x FABRIC8_PORT="tcp://172.30.170.143:80"
declare -x FABRIC8_PORT_80_TCP="tcp://172.30.170.143:80"
declare -x FABRIC8_PORT_80_TCP_ADDR="172.30.170.143"
declare -x FABRIC8_PORT_80_TCP_PORT="80"
declare -x FABRIC8_PORT_80_TCP_PROTO="tcp"
declare -x FABRIC8_SERVICE_HOST="172.30.170.143"
declare -x FABRIC8_SERVICE_PORT="80"
declare -x FLUENTD_VERSION="0.14.0.pre.1"
declare -x GOGS_PORT="tcp://172.30.123.138:80"
declare -x GOGS_PORT_80_TCP="tcp://172.30.123.138:80"
declare -x GOGS_PORT_80_TCP_ADDR="172.30.123.138"
declare -x GOGS_PORT_80_TCP_PORT="80"
declare -x GOGS_PORT_80_TCP_PROTO="tcp"
declare -x GOGS_SERVICE_HOST="172.30.123.138"
declare -x GOGS_SERVICE_PORT="80"
declare -x GOGS_SSH_PORT="tcp://172.30.47.28:22"
declare -x GOGS_SSH_PORT_22_TCP="tcp://172.30.47.28:22"
declare -x GOGS_SSH_PORT_22_TCP_ADDR="172.30.47.28"
declare -x GOGS_SSH_PORT_22_TCP_PORT="22"
declare -x GOGS_SSH_PORT_22_TCP_PROTO="tcp"
declare -x GOGS_SSH_SERVICE_HOST="172.30.47.28"
declare -x GOGS_SSH_SERVICE_PORT="22"
declare -x HOME="/root"
declare -x HOSTNAME="fluentd-elasticsearch-172.28.128.4"
declare -x JENKINS_PORT="tcp://172.30.61.148:80"
declare -x JENKINS_PORT_50000_TCP="tcp://172.30.61.148:50000"
declare -x JENKINS_PORT_50000_TCP_ADDR="172.30.61.148"
declare -x JENKINS_PORT_50000_TCP_PORT="50000"
declare -x JENKINS_PORT_50000_TCP_PROTO="tcp"
declare -x JENKINS_PORT_80_TCP="tcp://172.30.61.148:80"
declare -x JENKINS_PORT_80_TCP_ADDR="172.30.61.148"
declare -x JENKINS_PORT_80_TCP_PORT="80"
declare -x JENKINS_PORT_80_TCP_PROTO="tcp"
declare -x JENKINS_SERVICE_HOST="172.30.61.148"
declare -x JENKINS_SERVICE_PORT="80"
declare -x JENKINS_SERVICE_PORT_AGENT="50000"
declare -x JENKINS_SERVICE_PORT_HTTP="80"
declare -x KIBANA_PORT="tcp://172.30.180.252:80"
declare -x KIBANA_PORT_80_TCP="tcp://172.30.180.252:80"
declare -x KIBANA_PORT_80_TCP_ADDR="172.30.180.252"
declare -x KIBANA_PORT_80_TCP_PORT="80"
declare -x KIBANA_PORT_80_TCP_PROTO="tcp"
declare -x KIBANA_SERVICE_HOST="172.30.180.252"
declare -x KIBANA_SERVICE_PORT="80"
declare -x KUBERNETES_PORT="tcp://172.30.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://172.30.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="172.30.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_PORT_53_TCP="tcp://172.30.0.1:53"
declare -x KUBERNETES_PORT_53_TCP_ADDR="172.30.0.1"
declare -x KUBERNETES_PORT_53_TCP_PORT="53"
declare -x KUBERNETES_PORT_53_TCP_PROTO="tcp"
declare -x KUBERNETES_PORT_53_UDP="udp://172.30.0.1:53"
declare -x KUBERNETES_PORT_53_UDP_ADDR="172.30.0.1"
declare -x KUBERNETES_PORT_53_UDP_PORT="53"
declare -x KUBERNETES_PORT_53_UDP_PROTO="udp"
declare -x KUBERNETES_SERVICE_HOST="172.30.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_DNS="53"
declare -x KUBERNETES_SERVICE_PORT_DNS_TCP="53"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x LD_LIBRARY_PATH="/opt/rh/rh-ruby22/root/usr/lib64"
declare -x LESSOPEN="||/usr/bin/lesspipe.sh %s"
declare -x LS_COLORS=""
declare -x NEXUS_PORT="tcp://172.30.93.189:80"
declare -x NEXUS_PORT_80_TCP="tcp://172.30.93.189:80"
declare -x NEXUS_PORT_80_TCP_ADDR="172.30.93.189"
declare -x NEXUS_PORT_80_TCP_PORT="80"
declare -x NEXUS_PORT_80_TCP_PROTO="tcp"
declare -x NEXUS_SERVICE_HOST="172.30.93.189"
declare -x NEXUS_SERVICE_PORT="80"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x ROUTER_PORT="tcp://172.30.12.59:80"
declare -x ROUTER_PORT_80_TCP="tcp://172.30.12.59:80"
declare -x ROUTER_PORT_80_TCP_ADDR="172.30.12.59"
declare -x ROUTER_PORT_80_TCP_PORT="80"
declare -x ROUTER_PORT_80_TCP_PROTO="tcp"
declare -x ROUTER_PORT_9101_TCP="tcp://172.30.12.59:9101"
declare -x ROUTER_PORT_9101_TCP_ADDR="172.30.12.59"
declare -x ROUTER_PORT_9101_TCP_PORT="9101"
declare -x ROUTER_PORT_9101_TCP_PROTO="tcp"
declare -x ROUTER_SERVICE_HOST="172.30.12.59"
declare -x ROUTER_SERVICE_PORT="80"
declare -x ROUTER_SERVICE_PORT_80_TCP="80"
declare -x ROUTER_SERVICE_PORT_9101_TCP="9101"
declare -x SHLVL="1"

cmoulliard avatar Mar 10 '16 15:03 cmoulliard

So, this is an issue with the kubeclient (https://github.com/abonas/kubeclient). I can reproduce it

require 'kubeclient'

ssl_options = {
  #cert_store: OpenSSL::X509::Certificate.new(File.read('/etc/ssl/certs/ca-bundle.crt')),
  #client_cert: OpenSSL::X509::Certificate.new(File.read('/etc/ssl/certs/ca-bundle.crt')),
  #client_key:  OpenSSL::PKey::RSA.new(File.read('/path/to/client.key')),
  #ca_file:     '/path/to/ca.crt',
  verify_ssl:  OpenSSL::SSL::VERIFY_PEER
}
client = Kubeclient::Client.new 'https://172.17.0.1:8443/api/' , "v1",
                                ssl_options: ssl_options

-->

/opt/rh/rh-ruby22/root/usr/local/share/gems/gems/kubeclient-0.4.0/lib/kubeclient/common.rb:16:in `rescue in handle_exception': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (KubeException)
    from /opt/rh/rh-ruby22/root/usr/local/share/gems/gems/kubeclient-0.4.0/lib/kubeclient/common.rb:8:in `handle_exception'

cmoulliard avatar Mar 10 '16 17:03 cmoulliard

Right & that is totally expected because you've disabled the custom CA certificate... we enable this in fluentd to use the cluster CA certificate.

jimmidyson avatar Mar 10 '16 18:03 jimmidyson

How to troubleshoot this? I even set verify_ssl to false temporarily and it still complains with the same error. I have all three files (CA, public+private API keys) on the masters, but nothing on the kubelets, which I'm assuming is the issue. The pod has a service account with ca.crt, namespace and token. What's missing?

therc avatar Mar 23 '16 22:03 therc

(I'm actually using a custom gcr.io/google_containers/fluentd-elasticsearch, which I'm trying to improve, but this is basically the same issue.)

therc avatar Mar 23 '16 22:03 therc

Can you check in the fluentd pod if the service account token & CA crt are mounted in the /var/run/kubernetes.io/serviceaccount directory in the pod?

jimmidyson avatar Mar 27 '16 10:03 jimmidyson

Sorry that should be /var/run/secrets/kubernetes.io/serviceaccount

jimmidyson avatar Mar 27 '16 10:03 jimmidyson

I haven't been able to catch it live, because the pod stays for just a second, but from docker inspect:

"Binds": [
  "/var/lib/kubelet/pods/bbc2ba16-f418-11e5-a35d-0e581928f9dd/volumes/kubernetes.io~secret/default-token-0qnq5:/var/run/secrets/kubernetes.io/serviceaccount:ro",
 ls /var/lib/kubelet/pods/bbc2ba16-f418-11e5-a35d-0e581928f9dd/volumes/kubernetes.io~secret/default-token-0qnq5 -l
total 12
-r--r--r-- 1 root root 1757 Mar 27 12:37 ca.crt
-r--r--r-- 1 root root   11 Mar 27 12:37 namespace
-r--r--r-- 1 root root  856 Mar 27 12:37 token

therc avatar Mar 27 '16 12:03 therc

From the apiserver side: http: TLS handshake error from 10.68.5.3:41240: remote error: unknown certificate authority

therc avatar Mar 27 '16 13:03 therc

Can you verify that the CA certificate bring mounted matches the api server CA certificate? I might add some more debugging to the gem to output info on the CA certificate used.

jimmidyson avatar Mar 27 '16 19:03 jimmidyson

I'm testing with the schedulable=false kubelet that runs on the first master, so they're both on the same machine, and the checksum matches:

37ea98342398471dffa11babc92bb061  /srv/kube/ca.crt
37ea98342398471dffa11babc92bb061  /var/lib/kubelet/pods/bbc2ba16-f418-11e5-a35d-0e581928f9dd/volumes/kubernetes.io~secret/default-token-0qnq5/ca.crt

I suspect that it's something to do with the kubeconfig in the kubelet. The cluster was brought up by custom Ansible scripts. More debugging information would be great, as I suspect I won't be the first to hit a similar issue.

therc avatar Mar 27 '16 19:03 therc

Maybe it needs to use 1.1.2? https://github.com/abonas/kubeclient/issues/158

therc avatar Mar 31 '16 17:03 therc

I had a similar issue when building a custom version of gcr.io/google_containers/fluentd-elasticsearch from scratch by editing the Dockerfile at https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/fluentd-es-image/Dockerfile

I resolved it by starting with the gcr.io/google_containers/fluentd-elasticsearch:1.17 container and working from there.

haroldwoo avatar Jun 22 '16 21:06 haroldwoo

I am running into a similar timed out error connecting to the apiserver, but the error is occurring on fluentd pod running on minions (AWS). I'm thinking this is a connectivity issue and not fluentd pod can't start.

2017-02-20T22:41:40.318705800Z 2017-02-20 22:41:40 +0000 [info]: process finished code=256 2017-02-20T22:41:40.318718054Z 2017-02-20 22:41:40 +0000 [error]: fluentd main process died unexpectedly. restarting. 2017-02-20T22:41:40.318766746Z 2017-02-20 22:41:40 +0000 [info]: starting fluentd-0.12.31 2017-02-20T22:41:40.399446022Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.9.2' 2017-02-20T22:41:40.399475598Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-journal-parser' version '0.1.0' 2017-02-20T22:41:40.399483008Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.26.2' 2017-02-20T22:41:40.399486705Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-record-reformer' version '0.8.3' 2017-02-20T22:41:40.399490068Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluentd' version '0.12.31' 2017-02-20T22:41:40.399579007Z 2017-02-20 22:41:40 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata" 2017-02-20T22:42:40.743439680Z 2017-02-20 22:42:40 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error="Invalid Kubernetes API v1 endpoint https://apiServerIP:443/api: Timed out connecting to server" 2017-02-20T22:42:40.745553580Z 2017-02-20 22:42:40 +0000 [info]: process finished code=256 2017-02-20T22:42:40.745575572Z 2017-02-20 22:42:40 +0000 [error]: fluentd main process died unexpectedly. restarting.

bamb00 avatar Feb 20 '17 23:02 bamb00

To solve Your problem with: "Invalid Kubernetes API v1 endpoint XXX SSL_connect returned=1 errno=0 state=error: certificate verify failed" Just comment it out / delete it from Your td-agent.conf

<filter kubernetes.**>
  type kubernetes_metadata
</filter>

placydo avatar May 04 '17 13:05 placydo

delete your token in secrets and it will make new one

vickkrish avatar Jan 10 '18 03:01 vickkrish