fluent-plugin-gelf
fluent-plugin-gelf copied to clipboard
Fluentd fails to start if gelf tcp output cannot connect
When fluentd is configured with a tcp gelf output, that cannot connect, fluentd errors out, failing to start.
Centos 7 td-agent init scripts keep restarting fluentd, but it is not the expected behavior, as it halts all fluentd log processing and possibly results in message loss even in unrelated pipelines.
It may be related to #7, as fluentd core seems to be interacting improperly with the gelf library output connection. The problem may be in a difference between what fluentd expects and how gelf library behaves, that the plugin doesn't account for, but I don't know much about them.
2017-01-02 23:09:24 +0100 [error]: unexpected error error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"192.168.36.4\" port 12201"
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `initialize'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `new'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `connect'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block (2 levels) in addresses='
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `map'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block in addresses='
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `tap'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `addresses='
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:9:in `initialize'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:138:in `new'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:138:in `create_sender'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:36:in `initialize'
2017-01-02 23:09:24 +0100 [error]: /etc/td-agent/plugin/out_gelf.rb:42:in `new'
2017-01-02 23:09:24 +0100 [error]: /etc/td-agent/plugin/out_gelf.rb:42:in `start'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/compat/call_super_mixin.rb:42:in `start'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:138:in `block in start'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:117:in `block (2 levels) in lifecycle'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:117:in `block (2 levels) in lifecycle'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:116:in `each'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:116:in `block in lifecycle'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:109:in `each'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/agent.rb:109:in `lifecycle'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:116:in `block in lifecycle'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:113:in `each'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:113:in `lifecycle'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/root_agent.rb:137:in `start'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/engine.rb:211:in `start'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/engine.rb:175:in `run'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:697:in `run_engine'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:454:in `block in run_worker'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:626:in `call'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:626:in `main_process'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/supervisor.rb:449:in `run_worker'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/lib/fluent/command/fluentd.rb:288:in `<top (required)>'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in `require'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.7/bin/fluentd:5:in `<top (required)>'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `load'
2017-01-02 23:09:24 +0100 [error]: /opt/td-agent/embedded/bin/fluentd:23:in `<top (required)>'
2017-01-02 23:09:24 +0100 [error]: /usr/sbin/td-agent:7:in `load'
2017-01-02 23:09:24 +0100 [error]: /usr/sbin/td-agent:7:in `<main>'
2017-01-02 23:09:24 +0100 [error]: unexpected error error="Connection refused - connect(2) for \"192.168.36.4\" port 12201"
2017-01-02 23:09:24 +0100 [error]: suppressed same stacktrace
Hi @ivan-mjch , I also see this issue. I was trying shutting down Graylog to see if the retry works, and kind of worked:
2018-02-20 00:40:59 +0000 [warn]: #0 suppressed same stacktrace
2018-02-20 00:41:01 +0000 [warn]: #0 failed to flush the buffer. retry_time=2 next_retry_seconds=2018-02-20 00:41:01 +0000 chunk="5659a1017df2404f87465b9399491e6f" error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"graylog-staging.xxx.com\" port 12202"
But then I tried restarting Fluentd for a different reason, and started to get the errors you posted. Seems that as long as Fluentd is not restarted, Graylog connection errors will not break it.
Hello !
The same problem here, we tried to secure our log. I try to explain, we use graylog to handle our logs so we use the gelf plugin.
We would like to send our logs to graylog in normal situation, and if graylog falls keep the logs in a buffer file during this downtime.
Currently our configuration seems to works:
<match **>
@type copy
<store>
@type gelf
host "IP_GRAYLOG"
port XXXXX
protocol tcp
#flush_interval 5s
flush_at_shutdown false
<buffer>
@type file
path /var/log/td-agent/buffer/graylog_host
flush_mode immediate
retry_wait 1 # The wait interval for the first retry.
retry_exponential_backoff_base 1.1 # Inclease the wait time by a factor of N.
retry_type exponential_backoff # Set 'periodic' for constant intervals.
retry_max_interval 5m # Cap the wait interval. (see above)
retry_randomize true # Apply randomization. (see above)
retry_timeout 72h # Maximum duration before giving up.
# retry_max_times 17 # Maximum retry count before giving up.
retry_forever false # Set 'true' for infinite retry loops.
retry_secondary_threshold 0.8 # See the "Secondary Output" section in
total_limit_size 4GB # The size limitation of this buffer plugin instance
# compress gzip # The option to specify compression of each chunks, during events are buffered
</buffer>
</store>
</match>
But we have the same issue reported here. We stop td-agent/fluentd service, we stop graylog and we try to start again the td-agent/fluentd plugin.
When we do that, we have the same behavior (loop) :
2018-06-21 17:46:37 +0200 [error]: #0 unexpected error error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"IP_GRAYLOG\" port XXXXX"
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `initialize'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `new'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:31:in `connect'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block (2 levels) in addresses='
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `map'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:15:in `block in addresses='
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `tap'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:13:in `addresses='
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/transport/tcp.rb:9:in `initialize'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:139:in `new'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:139:in `create_sender'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:37:in `initialize'
2018-06-21 17:46:37 +0200 [error]: #0 /etc/td-agent/plugin/out_gelf.rb:35:in `new'
2018-06-21 17:46:37 +0200 [error]: #0 /etc/td-agent/plugin/out_gelf.rb:35:in `start'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/compat/call_super_mixin.rb:42:in `start'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:165:in `block in start'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:154:in `block (2 levels) in lifecycle'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:153:in `each'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:153:in `block in lifecycle'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:140:in `each'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:140:in `lifecycle'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/root_agent.rb:164:in `start'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/engine.rb:274:in `start'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/engine.rb:219:in `run'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:774:in `run_engine'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:523:in `block in run_worker'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:699:in `main_process'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/supervisor.rb:518:in `run_worker'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/lib/fluent/command/fluentd.rb:316:in `<top (required)>'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/site_ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:in `require'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/lib/ruby/gems/2.4.0/gems/fluentd-1.0.2/bin/fluentd:8:in `<top (required)>'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/bin/fluentd:22:in `load'
2018-06-21 17:46:37 +0200 [error]: #0 /opt/td-agent/embedded/bin/fluentd:22:in `<main>'
Any update on this issue ?
Many thanks,
Hello,
This project is dead ?
Regards,
I haven't had much time to work on this ... you may want to look at one of the forks for your immediate needs. I am planning to rename my "parent" repo into an organization so we can get more maintainers involved directly.
Hi @emsearcy, did you have a chance to look at this issue? The fork fluent-plugin-gelf-hs has the same issue.
Sadly that fork disabled issues tab.. and this project isn't actively maintained :(
yeah, I wish I had more time! I'm also not using Graylog any more ... I still love Graylog and Fluentd—both awesome tools—but it's hard to put in the time on a project I'm not running and have no live environments to test with.