workflow icon indicating copy to clipboard operation
workflow copied to clipboard

Fluentd pod crashing on Azure Container Service

Open sbulman opened this issue 7 years ago • 6 comments

Hi All,

I'm following the instructions to set up Deis on Azure Container Service. One of the deis-logger-fluentd pods is crashing with the following log.

2017-08-05 07:21:26 +0000 [info]: reading config file path="/opt/fluentd/conf/fluentd.conf" 2017-08-05 07:22:27 +0000 [error]: config error file="/opt/fluentd/conf/fluentd.conf" error_class=Fluent::ConfigError error="Invalid Kubernetes API v1 endpoint https://10.0.0.1:443: Timed out connecting to server"

Any ideas?

Thanks.

sbulman avatar Aug 05 '17 07:08 sbulman

A bit more info. I created the ACS cluster with 1 agent. The fluentd pod that is crashing is on the master node. The pod running on the agent appears to be working fine.

sbulman avatar Aug 05 '17 07:08 sbulman

We're facing the same issue, same symptoms and circumstances as @sbulman. The fluentd logger pod continually crashes on the master node on Azure ACS.

ghost avatar Sep 25 '17 19:09 ghost

There should not be a fluentd pod running on the master node. There was an open ticket on DaemonSet pods being accidentally scheduled on the kubernetes master node that was eventually solved upstream.

More background context in this ticket, which was resolved in Kubernetes 1.5.0+ via https://github.com/kubernetes/kubernetes/pull/35526.

bacongobbler avatar Sep 25 '17 19:09 bacongobbler

Ok, thanks @bacongobbler for the context. It still appears to be an issue though on ACS today. Any thoughts much appreciated!

The fluentd logger pod event for the master node indicates the following error:

Error syncing pod, skipping: failed to "StartContainer" for "deis-logger-fluentd" with CrashLoopBackOff: "Back-off 10s restarting failed container=deis-logger-fluentd pod=deis-logger-fluentd-swjnl_deis

K8S versions (client and Azure Container Service):

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-14T06:55:55Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}

Deis version 2.18.0

Fluentd pod is definitely running on the master node on ACS as denoted by the event logs, in this case being created by: k8s-master-47933ef9-0

ghost avatar Sep 25 '17 21:09 ghost

I also got same issue on my K8s/CoreOS. Not on ACS but might be same root cause.

In my case, it was fixed by adding the option --register-with-taints=node-role.kubernetes.io/master=true:NoSchedule to hyperkube.

The unschedulable field of a node is not respected by the DaemonSet controller.

monaka avatar Dec 25 '17 07:12 monaka

This issue was moved to teamhephy/workflow#6

Cryptophobia avatar Mar 20 '18 19:03 Cryptophobia