workflow
workflow copied to clipboard
Fluentd pod crashing on Azure Container Service
Hi All,
I'm following the instructions to set up Deis on Azure Container Service. One of the deis-logger-fluentd pods is crashing with the following log.
2017-08-05 07:21:26 +0000 [info]: reading config file path="/opt/fluentd/conf/fluentd.conf" 2017-08-05 07:22:27 +0000 [error]: config error file="/opt/fluentd/conf/fluentd.conf" error_class=Fluent::ConfigError error="Invalid Kubernetes API v1 endpoint https://10.0.0.1:443: Timed out connecting to server"
Any ideas?
Thanks.
A bit more info. I created the ACS cluster with 1 agent. The fluentd pod that is crashing is on the master node. The pod running on the agent appears to be working fine.
We're facing the same issue, same symptoms and circumstances as @sbulman. The fluentd logger pod continually crashes on the master node on Azure ACS.
There should not be a fluentd pod running on the master node. There was an open ticket on DaemonSet pods being accidentally scheduled on the kubernetes master node that was eventually solved upstream.
More background context in this ticket, which was resolved in Kubernetes 1.5.0+ via https://github.com/kubernetes/kubernetes/pull/35526.
Ok, thanks @bacongobbler for the context. It still appears to be an issue though on ACS today. Any thoughts much appreciated!
The fluentd logger pod event for the master node indicates the following error:
Error syncing pod, skipping: failed to "StartContainer" for "deis-logger-fluentd" with CrashLoopBackOff: "Back-off 10s restarting failed container=deis-logger-fluentd pod=deis-logger-fluentd-swjnl_deis
K8S versions (client and Azure Container Service):
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-14T06:55:55Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}
Deis version 2.18.0
Fluentd pod is definitely running on the master node on ACS as denoted by the event logs, in this case being created by: k8s-master-47933ef9-0
I also got same issue on my K8s/CoreOS. Not on ACS but might be same root cause.
In my case, it was fixed by adding the option --register-with-taints=node-role.kubernetes.io/master=true:NoSchedule
to hyperkube
.
The unschedulable field of a node is not respected by the DaemonSet controller.
This issue was moved to teamhephy/workflow#6