fabric8
fabric8 copied to clipboard
fluentd pod can't start - Invalid Kubernetes API endpoint: SSL_connect returned=1
The docker container "fabric8/fluentd-kubernetes:v1.10" which is created from the pod "fluentd-elasticsearch-172.28.128.4" can't start and reports this error
2016-03-10 15:40:33 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2016-03-10 15:40:35 +0000 [error]: config error file="/etc/fluent/fluent.conf" error="Invalid Kubernetes API endpoint: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed"
2016-03-10 15:40:35 +0000 [info]: process finished code=256
2016-03-10 15:40:35 +0000 [error]: fluentd main process died unexpectedly. restarting.
2016-03-10 15:40:35 +0000 [info]: starting fluentd-0.14.0.pre.1
2016-03-10 15:40:36 +0000 [info]: gem 'fluent-plugin-docker_metadata_filter' version '0.1.1'
2016-03-10 15:40:36 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.3.0'
2016-03-10 15:40:36 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.15.0'
2016-03-10 15:40:36 +0000 [info]: gem 'fluentd' version '0.14.0.pre.1'
2016-03-10 15:40:36 +0000 [info]: gem 'fluentd' version '0.12.20'
2016-03-10 15:40:36 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2016-03-10 15:40:38 +0000 [error]: config error file="/etc/fluent/fluent.conf" error="Invalid Kubernetes API endpoint: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed"
2016-03-10 15:40:38 +0000 [info]: process finished code=256
Fluentd config of the pod
[root@fluentd-elasticsearch-172 /]# more /etc/fluent/fluent.conf
<source>
type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%N
tag kubernetes.*
format json
read_from_head true
keep_time_key true
</source>
<filter kubernetes.**>
type kubernetes_metadata
</filter>
<match **>
type elasticsearch
log_level info
include_tag_key true
time_key time
host elasticsearch
port 9200
scheme http
buffer_type memory
buffer_chunk_limit 8m
buffer_queue_limit 8192
flush_interval 10s
retry_limit 10
disable_retry_limit
retry_wait 1s
max_retry_wait 60s
num_threads 1
logstash_format true
</match>
Vars
declare -x DOCKER_REGISTRY_PORT="tcp://172.30.11.237:5000"
declare -x DOCKER_REGISTRY_PORT_5000_TCP="tcp://172.30.11.237:5000"
declare -x DOCKER_REGISTRY_PORT_5000_TCP_ADDR="172.30.11.237"
declare -x DOCKER_REGISTRY_PORT_5000_TCP_PORT="5000"
declare -x DOCKER_REGISTRY_PORT_5000_TCP_PROTO="tcp"
declare -x DOCKER_REGISTRY_SERVICE_HOST="172.30.11.237"
declare -x DOCKER_REGISTRY_SERVICE_PORT="5000"
declare -x DOCKER_REGISTRY_SERVICE_PORT_5000_TCP="5000"
declare -x ELASTICSEARCH_HOST="elasticsearch"
declare -x ELASTICSEARCH_PORT="9200"
declare -x ELASTICSEARCH_PORT_9200_TCP="tcp://172.30.135.114:9200"
declare -x ELASTICSEARCH_PORT_9200_TCP_ADDR="172.30.135.114"
declare -x ELASTICSEARCH_PORT_9200_TCP_PORT="9200"
declare -x ELASTICSEARCH_PORT_9200_TCP_PROTO="tcp"
declare -x ELASTICSEARCH_SERVICE_HOST="172.30.135.114"
declare -x ELASTICSEARCH_SERVICE_PORT="9200"
declare -x FABRIC8_DOCKER_REGISTRY_PORT="tcp://172.30.235.198:80"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP="tcp://172.30.235.198:80"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP_ADDR="172.30.235.198"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP_PORT="80"
declare -x FABRIC8_DOCKER_REGISTRY_PORT_80_TCP_PROTO="tcp"
declare -x FABRIC8_DOCKER_REGISTRY_SERVICE_HOST="172.30.235.198"
declare -x FABRIC8_DOCKER_REGISTRY_SERVICE_PORT="80"
declare -x FABRIC8_FORGE_PORT="tcp://172.30.61.51:80"
declare -x FABRIC8_FORGE_PORT_80_TCP="tcp://172.30.61.51:80"
declare -x FABRIC8_FORGE_PORT_80_TCP_ADDR="172.30.61.51"
declare -x FABRIC8_FORGE_PORT_80_TCP_PORT="80"
declare -x FABRIC8_FORGE_PORT_80_TCP_PROTO="tcp"
declare -x FABRIC8_FORGE_SERVICE_HOST="172.30.61.51"
declare -x FABRIC8_FORGE_SERVICE_PORT="80"
declare -x FABRIC8_PORT="tcp://172.30.170.143:80"
declare -x FABRIC8_PORT_80_TCP="tcp://172.30.170.143:80"
declare -x FABRIC8_PORT_80_TCP_ADDR="172.30.170.143"
declare -x FABRIC8_PORT_80_TCP_PORT="80"
declare -x FABRIC8_PORT_80_TCP_PROTO="tcp"
declare -x FABRIC8_SERVICE_HOST="172.30.170.143"
declare -x FABRIC8_SERVICE_PORT="80"
declare -x FLUENTD_VERSION="0.14.0.pre.1"
declare -x GOGS_PORT="tcp://172.30.123.138:80"
declare -x GOGS_PORT_80_TCP="tcp://172.30.123.138:80"
declare -x GOGS_PORT_80_TCP_ADDR="172.30.123.138"
declare -x GOGS_PORT_80_TCP_PORT="80"
declare -x GOGS_PORT_80_TCP_PROTO="tcp"
declare -x GOGS_SERVICE_HOST="172.30.123.138"
declare -x GOGS_SERVICE_PORT="80"
declare -x GOGS_SSH_PORT="tcp://172.30.47.28:22"
declare -x GOGS_SSH_PORT_22_TCP="tcp://172.30.47.28:22"
declare -x GOGS_SSH_PORT_22_TCP_ADDR="172.30.47.28"
declare -x GOGS_SSH_PORT_22_TCP_PORT="22"
declare -x GOGS_SSH_PORT_22_TCP_PROTO="tcp"
declare -x GOGS_SSH_SERVICE_HOST="172.30.47.28"
declare -x GOGS_SSH_SERVICE_PORT="22"
declare -x HOME="/root"
declare -x HOSTNAME="fluentd-elasticsearch-172.28.128.4"
declare -x JENKINS_PORT="tcp://172.30.61.148:80"
declare -x JENKINS_PORT_50000_TCP="tcp://172.30.61.148:50000"
declare -x JENKINS_PORT_50000_TCP_ADDR="172.30.61.148"
declare -x JENKINS_PORT_50000_TCP_PORT="50000"
declare -x JENKINS_PORT_50000_TCP_PROTO="tcp"
declare -x JENKINS_PORT_80_TCP="tcp://172.30.61.148:80"
declare -x JENKINS_PORT_80_TCP_ADDR="172.30.61.148"
declare -x JENKINS_PORT_80_TCP_PORT="80"
declare -x JENKINS_PORT_80_TCP_PROTO="tcp"
declare -x JENKINS_SERVICE_HOST="172.30.61.148"
declare -x JENKINS_SERVICE_PORT="80"
declare -x JENKINS_SERVICE_PORT_AGENT="50000"
declare -x JENKINS_SERVICE_PORT_HTTP="80"
declare -x KIBANA_PORT="tcp://172.30.180.252:80"
declare -x KIBANA_PORT_80_TCP="tcp://172.30.180.252:80"
declare -x KIBANA_PORT_80_TCP_ADDR="172.30.180.252"
declare -x KIBANA_PORT_80_TCP_PORT="80"
declare -x KIBANA_PORT_80_TCP_PROTO="tcp"
declare -x KIBANA_SERVICE_HOST="172.30.180.252"
declare -x KIBANA_SERVICE_PORT="80"
declare -x KUBERNETES_PORT="tcp://172.30.0.1:443"
declare -x KUBERNETES_PORT_443_TCP="tcp://172.30.0.1:443"
declare -x KUBERNETES_PORT_443_TCP_ADDR="172.30.0.1"
declare -x KUBERNETES_PORT_443_TCP_PORT="443"
declare -x KUBERNETES_PORT_443_TCP_PROTO="tcp"
declare -x KUBERNETES_PORT_53_TCP="tcp://172.30.0.1:53"
declare -x KUBERNETES_PORT_53_TCP_ADDR="172.30.0.1"
declare -x KUBERNETES_PORT_53_TCP_PORT="53"
declare -x KUBERNETES_PORT_53_TCP_PROTO="tcp"
declare -x KUBERNETES_PORT_53_UDP="udp://172.30.0.1:53"
declare -x KUBERNETES_PORT_53_UDP_ADDR="172.30.0.1"
declare -x KUBERNETES_PORT_53_UDP_PORT="53"
declare -x KUBERNETES_PORT_53_UDP_PROTO="udp"
declare -x KUBERNETES_SERVICE_HOST="172.30.0.1"
declare -x KUBERNETES_SERVICE_PORT="443"
declare -x KUBERNETES_SERVICE_PORT_DNS="53"
declare -x KUBERNETES_SERVICE_PORT_DNS_TCP="53"
declare -x KUBERNETES_SERVICE_PORT_HTTPS="443"
declare -x LD_LIBRARY_PATH="/opt/rh/rh-ruby22/root/usr/lib64"
declare -x LESSOPEN="||/usr/bin/lesspipe.sh %s"
declare -x LS_COLORS=""
declare -x NEXUS_PORT="tcp://172.30.93.189:80"
declare -x NEXUS_PORT_80_TCP="tcp://172.30.93.189:80"
declare -x NEXUS_PORT_80_TCP_ADDR="172.30.93.189"
declare -x NEXUS_PORT_80_TCP_PORT="80"
declare -x NEXUS_PORT_80_TCP_PROTO="tcp"
declare -x NEXUS_SERVICE_HOST="172.30.93.189"
declare -x NEXUS_SERVICE_PORT="80"
declare -x OLDPWD
declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x ROUTER_PORT="tcp://172.30.12.59:80"
declare -x ROUTER_PORT_80_TCP="tcp://172.30.12.59:80"
declare -x ROUTER_PORT_80_TCP_ADDR="172.30.12.59"
declare -x ROUTER_PORT_80_TCP_PORT="80"
declare -x ROUTER_PORT_80_TCP_PROTO="tcp"
declare -x ROUTER_PORT_9101_TCP="tcp://172.30.12.59:9101"
declare -x ROUTER_PORT_9101_TCP_ADDR="172.30.12.59"
declare -x ROUTER_PORT_9101_TCP_PORT="9101"
declare -x ROUTER_PORT_9101_TCP_PROTO="tcp"
declare -x ROUTER_SERVICE_HOST="172.30.12.59"
declare -x ROUTER_SERVICE_PORT="80"
declare -x ROUTER_SERVICE_PORT_80_TCP="80"
declare -x ROUTER_SERVICE_PORT_9101_TCP="9101"
declare -x SHLVL="1"
So, this is an issue with the kubeclient (https://github.com/abonas/kubeclient). I can reproduce it
require 'kubeclient'
ssl_options = {
#cert_store: OpenSSL::X509::Certificate.new(File.read('/etc/ssl/certs/ca-bundle.crt')),
#client_cert: OpenSSL::X509::Certificate.new(File.read('/etc/ssl/certs/ca-bundle.crt')),
#client_key: OpenSSL::PKey::RSA.new(File.read('/path/to/client.key')),
#ca_file: '/path/to/ca.crt',
verify_ssl: OpenSSL::SSL::VERIFY_PEER
}
client = Kubeclient::Client.new 'https://172.17.0.1:8443/api/' , "v1",
ssl_options: ssl_options
-->
/opt/rh/rh-ruby22/root/usr/local/share/gems/gems/kubeclient-0.4.0/lib/kubeclient/common.rb:16:in `rescue in handle_exception': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (KubeException)
from /opt/rh/rh-ruby22/root/usr/local/share/gems/gems/kubeclient-0.4.0/lib/kubeclient/common.rb:8:in `handle_exception'
Right & that is totally expected because you've disabled the custom CA certificate... we enable this in fluentd to use the cluster CA certificate.
How to troubleshoot this? I even set verify_ssl to false temporarily and it still complains with the same error. I have all three files (CA, public+private API keys) on the masters, but nothing on the kubelets, which I'm assuming is the issue. The pod has a service account with ca.crt, namespace and token. What's missing?
(I'm actually using a custom gcr.io/google_containers/fluentd-elasticsearch, which I'm trying to improve, but this is basically the same issue.)
Can you check in the fluentd pod if the service account token & CA crt are mounted in the /var/run/kubernetes.io/serviceaccount directory in the pod?
Sorry that should be /var/run/secrets/kubernetes.io/serviceaccount
I haven't been able to catch it live, because the pod stays for just a second, but from docker inspect
:
"Binds": [
"/var/lib/kubelet/pods/bbc2ba16-f418-11e5-a35d-0e581928f9dd/volumes/kubernetes.io~secret/default-token-0qnq5:/var/run/secrets/kubernetes.io/serviceaccount:ro",
ls /var/lib/kubelet/pods/bbc2ba16-f418-11e5-a35d-0e581928f9dd/volumes/kubernetes.io~secret/default-token-0qnq5 -l
total 12
-r--r--r-- 1 root root 1757 Mar 27 12:37 ca.crt
-r--r--r-- 1 root root 11 Mar 27 12:37 namespace
-r--r--r-- 1 root root 856 Mar 27 12:37 token
From the apiserver side:
http: TLS handshake error from 10.68.5.3:41240: remote error: unknown certificate authority
Can you verify that the CA certificate bring mounted matches the api server CA certificate? I might add some more debugging to the gem to output info on the CA certificate used.
I'm testing with the schedulable=false kubelet that runs on the first master, so they're both on the same machine, and the checksum matches:
37ea98342398471dffa11babc92bb061 /srv/kube/ca.crt
37ea98342398471dffa11babc92bb061 /var/lib/kubelet/pods/bbc2ba16-f418-11e5-a35d-0e581928f9dd/volumes/kubernetes.io~secret/default-token-0qnq5/ca.crt
I suspect that it's something to do with the kubeconfig in the kubelet. The cluster was brought up by custom Ansible scripts. More debugging information would be great, as I suspect I won't be the first to hit a similar issue.
Maybe it needs to use 1.1.2? https://github.com/abonas/kubeclient/issues/158
I had a similar issue when building a custom version of gcr.io/google_containers/fluentd-elasticsearch from scratch by editing the Dockerfile at https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/fluentd-es-image/Dockerfile
I resolved it by starting with the gcr.io/google_containers/fluentd-elasticsearch:1.17 container and working from there.
I am running into a similar timed out error connecting to the apiserver, but the error is occurring on fluentd pod running on minions (AWS). I'm thinking this is a connectivity issue and not fluentd pod can't start.
2017-02-20T22:41:40.318705800Z 2017-02-20 22:41:40 +0000 [info]: process finished code=256 2017-02-20T22:41:40.318718054Z 2017-02-20 22:41:40 +0000 [error]: fluentd main process died unexpectedly. restarting. 2017-02-20T22:41:40.318766746Z 2017-02-20 22:41:40 +0000 [info]: starting fluentd-0.12.31 2017-02-20T22:41:40.399446022Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.9.2' 2017-02-20T22:41:40.399475598Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-journal-parser' version '0.1.0' 2017-02-20T22:41:40.399483008Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.26.2' 2017-02-20T22:41:40.399486705Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluent-plugin-record-reformer' version '0.8.3' 2017-02-20T22:41:40.399490068Z 2017-02-20 22:41:40 +0000 [info]: gem 'fluentd' version '0.12.31' 2017-02-20T22:41:40.399579007Z 2017-02-20 22:41:40 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata" 2017-02-20T22:42:40.743439680Z 2017-02-20 22:42:40 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error="Invalid Kubernetes API v1 endpoint https://apiServerIP:443/api: Timed out connecting to server" 2017-02-20T22:42:40.745553580Z 2017-02-20 22:42:40 +0000 [info]: process finished code=256 2017-02-20T22:42:40.745575572Z 2017-02-20 22:42:40 +0000 [error]: fluentd main process died unexpectedly. restarting.
To solve Your problem with: "Invalid Kubernetes API v1 endpoint XXX SSL_connect returned=1 errno=0 state=error: certificate verify failed" Just comment it out / delete it from Your td-agent.conf
<filter kubernetes.**>
type kubernetes_metadata
</filter>
delete your token in secrets and it will make new one