fluentd-kubernetes-daemonset
fluentd-kubernetes-daemonset copied to clipboard
Fluentd worker crashing on startup when connecting to Graylog
Describe the bug
We've installed Fluentd in our AWS EKS cluster, connecting to Graylog, and it was functioning well. However, two days ago, the fluentd worker unexpectedly crashed. Fluentd pod logs consistently display the following messages:
2024-01-19 04:20:50 +0000 [error]: #0 unexpected error error_class=NameError error="uninitialized constant GELF::Notifier::Fixnum"
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:65:in `level='
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:24:in `initialize'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `new'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/compat/call_super_mixin.rb:42:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:203:in `block in start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:192:in `block (2 levels) in lifecycle'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `each'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `block in lifecycle'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `each'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `lifecycle'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:202:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:248:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:147:in `run'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:617:in `block in run_worker'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:962:in `main_process'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:608:in `run_worker'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:372:in `<top (required)>'
2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `load'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `<main>'
2024-01-19 04:20:50 +0000 [error]: Worker 0 exited unexpectedly with status 1
Logs from 19/01/2024, 09:50:44
Any help would be appreciated on how we could fix this, can give further logs/code if necessary.
To Reproduce
Fluentd Pod logs
2024-01-19 04:20:50 +0000 [error]: #0 unexpected error error_class=NameError error="uninitialized constant GELF::Notifier::Fixnum"
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:65:in `level='
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:24:in `initialize'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `new'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/compat/call_super_mixin.rb:42:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:203:in `block in start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:192:in `block (2 levels) in lifecycle'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `each'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `block in lifecycle'
Expected behavior
fluentd needs to connect graylog instance. It was working fine for long time, suddenly crashed.
Your Environment
- Tag of using fluentd-kubernetes-daemonset: v1-debian-graylog
Your Configuration
fluentd.yaml
#ref: https://github.com/fluent/fluentd-kubernetes-daemonset (fcdf045)
# create an identity for fluentd
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: kube-system
# grant fluentd permissions to read, list, and watch pods and namespaces in Kubernetes cluster
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluentd
namespace: kube-system
rules:
- apiGroups:
- ""
resources:
- pods
- namespaces
verbs:
- get
- list
- watch
# bind the fluentd ServiceAccount to these permissions using the ClusterRoleBinding
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd
roleRef:
kind: ClusterRole
name: fluentd
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: fluentd
namespace: kube-system
# deploy fluentd DaemonSet
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
labels:
k8s-app: fluentd-logging
version: v1
spec:
selector:
matchLabels:
k8s-app: fluentd-logging
version: v1
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: fluentd-logging
version: v1
spec:
serviceAccount: fluentd
serviceAccountName: fluentd
# Enable tolerations if you want to run daemonset on master nodes.
# Recommended to disable on managed k8s.
# tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-graylog
imagePullPolicy: IfNotPresent
env:
- name: FLUENT_GRAYLOG_HOST
value: "log.int.*****.com"
- name: FLUENT_GRAYLOG_PORT
value: "12208"
- name: FLUENT_GRAYLOG_PROTOCOL
value: "udp"
- name: FLUENTD_SYSTEMD_CONF
value: "disable"
resources:
requests:
cpu: 200m
memory: 0.5Gi
limits:
# ===========
# Less memory leads to child process problems.
cpu: 1000m
memory: 1Gi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
securityContext:
privileged: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
### Your Error Log
```shell
2024-01-19 04:20:50 +0000 [error]: #0 unexpected error error_class=NameError error="uninitialized constant GELF::Notifier::Fixnum"
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:65:in `level='
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:24:in `initialize'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `new'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/compat/call_super_mixin.rb:42:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:203:in `block in start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:192:in `block (2 levels) in lifecycle'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `each'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `block in lifecycle'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `each'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `lifecycle'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:202:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:248:in `start'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:147:in `run'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:617:in `block in run_worker'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:962:in `main_process'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:608:in `run_worker'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:372:in `<top (required)>'
2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `load'
2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `<main>'
2024-01-19 04:20:50 +0000 [error]: Worker 0 exited unexpectedly with status 1
Logs from 19/01/2024, 09:50:44
### Additional context
_No response_
Hi, I just got the same error when using this image. I'm not a Ruby programmer, but I've read somewhere that Fixnum class is deprecated. Maybe there is some Ruby version or GELF plugin version mismatch? If you check https://github.com/graylog-labs/gelf-rb/blob/master/lib/gelf/notifier.rb then you'll see there is Integer there. But the code in container is using Fixnum.
I'll try to update stuff in image to newest versions in custom Dockerfile. Maybe this will do the trick.
I've managed to work around this issue via the following Dockerfile + setting LD_PRELOAD=""
to fix some other issue. This works for me:
RUN gem install gelf RUN gem install fluent-plugin-gelf-hs
I've managed to work around this issue via the following Dockerfile + setting
LD_PRELOAD=""
to fix some other issue. This works for me:RUN gem install gelf RUN gem install fluent-plugin-gelf-hs
gem install gelf fluent-plugin-gelf-hs
worked for us too. The difference was 3.1.0 version of gelf instead of 3.0.0. Manually changed the version in Gemfile used by docker image and it worked.
This is the chain of related events that led to the disaster:
- graylog flavour of fluentd-kubernetes-daemonset uses gelf 3.0.0 and this version of gelf gem has Fixnum in code,
- in ruby 3.2 Fixnum was removed after previous deprecation in version 2.4 https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/
- ruby in newest fluentd was upgraded to 3.2
https://github.com/fluent/fluentd-docker-image/commit/4f1d5e8dcdbbed10d1458edaecfb771f6ba9f05e
so it also happened in fluentd-kubernetes-daemonset https://github.com/fluent/fluentd-kubernetes-daemonset/commit/442ee9e026f02a6daef729f3ccb487e11dabde45 - gelf 3.0.0 cannot work with ruby 3.2+ so we can see sad error on container start
Good news is that unlucky Fixnum was removed in last 3.0.1 gelf gem version on commit that should prepare it to ruby 2.4 deprecation: https://github.com/graylog-labs/gelf-rb/commit/7cc3cbb63556f54967699e034b1d51cf30bf1c6f so maybe all to do is to bump up gelf version in Gemfile.erb in this project https://github.com/fluent/fluentd-kubernetes-daemonset/blob/29fdf0324742de37b635e1fa3884366e4f38b183/templates/Gemfile.erb#L48
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
Is this issue resolved? Should we update some dependencies? Looks like the gemspec of fluent-plugin-gelf-hs has no problem.
Oh, I see.
#219 fixed the version of gelf
to 3.0.0
, so the image installs gelf 3.0.0
, currently.
Do we not need #219 anymore? If so, we should revert #219.
Do we not need #219 anymore?
Until Fluentd v1.8.0, gelf-rb 3.1.0
was causing a severe error that Fluentd could not start.
The problem was partially fixed by https://github.com/fluent/fluentd/pull/2709 (Fluentd v1.8.0).
Since Fluentd v1.8.0, Fluentd can start correctly with gelf-rb 3.1.0
.
However, gelf-rb 3.1.0
still breaks Fluentd's config parser.
It causes an error that Fluentd cannot reload config by SIGUSR2
.
So, we still need #219.
Please note that if you are updating gelf-rb
manually, reloading by SIGUSR2
is not possible.
gelf-rb
is no longer maintained.
https://github.com/graylog-labs/gelf-rb/issues/93#issuecomment-750307376
It would be better to use gelf_redux. (https://github.com/graylog-labs/gelf-rb/issues/93#issuecomment-1603109468)