logstash
logstash copied to clipboard
All output fail if one of many outputs fails
Logstash -> TCP: tcp output exception, EOFError: End of file reached
According to:
- https://github.com/logstash-plugins/logstash-output-tcp/issues/9, and
- https://github.com/logstash-plugins/logstash-output-tcp/issues/10
This is caused by the shutting down of TCP server, which the logstash’s TCP client is trying to connect to. And this is indeed the case.
As stated in the code tcp.rb Line 153 of logstash-output-tcp plugin:
“# don't expect any reads, but a readable socket might # mean the remote end closed, so read it and throw it away. # we'll get an EOFError if it happens.”
However:
Logstash stopped outputting event via all output plugins once TCP output stopped.
Once the TCP output was not working due to the TCP server was stopped, all of our other outputs stopped working as well, such as Redis.
According to:
- https://github.com/elastic/logstash/issues/2463, and
- https://github.com/elastic/logstash/issues/1933, and
- https://github.com/logstash-plugins/logstash-output-tcp/issues/10
Many people have encountered such situation: all output fail if one of many outputs fails. It has been behaving like such for a long time, and according to different developers’ experience, it depends on the plugin. Some plugin’s failure will block all outputs but some won’t.
I think this is a serious problem and has be to be fixed. And I think this is a more general issue not related to single particular Logstash plugin, so I submitted it here.
- Version: 5.6.2
- Operating System:
- Config File (if you have sensitive info, please remove it):
- Sample Data:
- Steps to Reproduce:
We agree that this is a problem! What is your preferred solution here. First, if there are outputs that fail without blocking the whole pipeline, that is a bug and a violation of our durability guarantees. Which ones do not behave like this?
My second question is, what is your desired behavior?
If an output is broken we can either:
- Drop all messages going to it resulting in data loss to that output
- Buffer those messages to disk (which can't last forever)
For your use case, which is preferable?
Hi @andrewvc , thanks for your reply.
-
In my case, the TCP output is blocking other outputs (like, redis).
-
What I am thinking is that it's better to make this configurable (in logstash.yml or in logstash.conf, which is another decision), and give the option to the users.
This question also depends. We would prefer temporally store the data some where for recovery. However, we are also using Kafka as message bus, from which we can retrieve the passed data and let Logstash process them again, without storing them again at Logstash's side.
@fluency03 this is where it gets so challenging. Where should we store it while we wait? The local FS, somewhere else?
What would your preference be?
I'm thinking the best thing to do here would be to add a new buffering
policy to filters and outputs. It would have settings for: blocking
(current behavior) drop
(drop any events, never retry).
We could later add a buffer
behavior, but there are a lot of complications and tradeoffs there that would take a while to design. Would you be OK with just dropping events @fluency03 in your use case?
Even just dropping events in the offending output and logging a WARN to stderr would be better than the current behaviour of one plugin being able to lock up the entire logstash instance.
@wadejensen A custom plugin would allow you to make this decision, should you want it.
Logstash is designed, and intended, to never drop data. The idea of dropping data when an output is slow or unreachable is not something I will be easily convinced. The impact of our existing design choice is that Logstash goes as fast as the slowest component.
An output "failing" is a very complex subject. Failure is subjective for everyone -- there are so many symptoms which could be classified as a network partition, temporary or otherwise, and you are asking to drop data any time there is any kind of fault. In many ways, an overloaded server is fundamentally indistinguishable from a failed server.
If you are open to data loss during network partitions or other faults, you have a few options for outputs:
- use the UDP output, assuming DNS is functioning (?), packets will go out and it's up to the network to lose or deliver them.
- use something like rabbitmq or redis which allows you to drop data for anything downstream that's not listening (though a failure manifests differently here, and also requires rabbitmq/redis be online).
- Write a custom plugin; this could be to fork our existing plugins and implement your dropping behavior yourself.
My entire sysadmin/operations experience has informed Logstash's never-drop-data design. I am open to discussing other behaviors, but it will take more than saying "it would be better to drop" to convince me. This is not meant as a challenge, but to say that I have considered for many years these concerns and I still resist most requests to drop data during network faults. I am listening to you, though, and I appreciate this feedback and discussion.
We have some ideas (@andrewvc's been exploring them) for adding some kind of stream branching where you can, by your pipeline's design, have lossy/asynchronous outputs in a separate pipeline but still have strong delivery attempts on other outputs. I don't know how this will look in the long-term, but it is on our radar. It's less a checkbox to enable "drop data when an output is having problems" and more a way to model your pipeline's delivery priorities.
The wonderful thing to look forward to in 6.0 is independent pipelines (yay!). While the feature itself doesn't solve the problem you're describing, it provides easier methods to mitigate it while better, more complete solutions are worked on.
Imagine a single Logstash pipeline that receives from source S, processes the events, and broadcasts them in parallel to n instances of a broker (Redis, Kafka, etc.). Then you can have independent "publishing" instances each reading from their own broker instance and shipping to the intended outbound service independent from the others. The best part of 6.0 is that all of these pipelines would exist within the same JVM, rather than separate instances. With Monitoring enabled, you'd be able to see individual flow rates for each of the "output" pipelines.
In the future, Logstash may (subject to change or alteration at any time) allow you to route this traffic flow internally, removing the need for the broker altogether, via the stem branching flow that @jordansissel just mentioned. The team is aware of the shortcomings and is working on ways to improve things.
@untergeek thank you for describing this better than I was able ❤️
@jordansissel Thanks for your response and suggestions. I think we are optimising for different things here so I doubt one will convince the other but I'll try and express my take.
You've said that intentional data loss or dropping data is not the answer, but from my perspective the current solution is unintuitive and does cause data to be lost.
My use case: I have 120 baremetal nodes each running Filebeat to collect logs written to the baremetal filesystem each of which passes logs to one of 3 dockerised Logstash instances in a round robin format. We run a multitenant platform in which only privileged users have access to the baremetal filesystem, so this Logstash is the only access which users have for retrieving logs created by their various applications.
There are outputs to elasticsearch, Kafka and other Logstash instances downstream via TCP which are managed by the users where they can create their own filters.
For me, it begs belief that my platform Logstash service should be brought to its knees if a user takes down their downstream Kafka or Logstash for some reason. The messages I am responsible for delivering to Elasticsearch do not get delivered, and are effectively lost unless I manually retrieve them from the baremetal filesystem or replay the filebeat events. You're not really preventing data loss, just punting the responsibility to the upstream service, or in the case of your UDP suggestion, the upstream service.
I think where we clash is that our system is designed to service multiple use cases, and Logstash as it is designed does not cater to that. But rather there should be one Logstash per output in most cases, particularly in a multitenant environment.
This just feels like a shame and a missed opportunity from my perspective, but its not your job to make free software for me that fits my needs, its my job to pick software which meets them.
To that end, it might be beneficial to point out prominently in the documentation that Logstash in the current single pipeline mode operates as slowly as the slowest output, as I don't this is intuitive to users. Its an easy expectation to have and understandable that we might get upset when the software doesn't do what we thought it said it would on the tin.
Thank you very much for your contributions to Logstash.
@wadejensen are you aware of the upcoming multiple pipelines feature in 6.0? Does that change things? https://www.elastic.co/guide/en/logstash/master/multiple-pipelines.html
On Nov 4, 2017 12:51 AM, "Wade Jensen" [email protected] wrote:
@jordansissel https://github.com/jordansissel Thanks for your response and suggestions. I think we are optimising for different things here so I doubt one will convince the other but I'll try and express my take.
You've said that intentional data loss or dropping data is not the answer, but from my perspective the current solution is unintuitive and does cause data to be lost.
My use case: I have 120 baremetal nodes each running Filebeat to collect logs written to the baremetal filesystem each of which passes logs to one of 3 dockerised Logstash instances in a round robin format. We run a multitenant platform in which only privileged users have access to the baremetal filesystem, so this Logstash is the only access which users have for retrieving logs created by their various applications.
There are outputs to elasticsearch, Kafka and other Logstash instances downstream via TCP which are managed by the users where they can create their own filters.
For me, it begs belief that my platform Logstash service should be brought to its knees if a user takes down their downstream Kafka or Logstash for some reason. The messages I am responsible for delivering to Elasticsearch do not get delivered, and are effectively lost unless I manually retrieve them from the baremetal filesystem or replay the filebeat events. You're not really preventing data loss, just punting the responsibility to the upstream service, or in the case of your UDP suggestion, the upstream service.
I think where we clash is that our system is designed to service multiple use cases, and Logstash as it is designed does not cater to that. But rather there should be one Logstash per output in most cases, particularly in a multitenant environment.
This just feels like a shame and a missed opportunity from my perspective, but its not your job to make free software for me that fits my needs, its my job to pick software which meets them.
To that end, it might be beneficial to point out prominently in the documentation that Logstash in the current single pipeline mode operates as slowly as the slowest output, as I don't this is intuitive to users. Its an easy expectation to have and understandable that we might get upset when the software doesn't do what we thought it said it would on the tin.
Thank you very much for your contributions to Logstash.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/logstash/issues/8524#issuecomment-341874346, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIBY9caN1ukVPezR18Q0RR89OdCWuHfks5sy_t1gaJpZM4QFoon .
Thanks all of your discussions.
I also saw this regarding multiple-pipelines:
Having multiple pipelines in a single instance also allows these event flows to have different performance and durability parameters (for example, different settings for pipeline workers and persistent queues). This separation means that a blocked output in one pipeline won’t exert backpressure in the other.
I think this is really good design for logic separations.
Today, we encountered another output blocking: elasticsearch output is also blocking other outputs, when elasticsearch output plugin is having error making index.
Even though multiple-pipelines could partially solve this problem. The problem still remains.
For example, there could also be multiple outputs within a single pipeline. All outputs within its pipeline can still be blocked due to one output blocking of that pipeline.
I am also think is it a good idea to have a 'rescue' output.
For example, when somehow the data I got is incomplete, and they are passed to elasticsearch output. If my elastic output is doing dynamically indexing like:
index => '%{[@metadata][beat]}-%{+YYYY.MM}'
document_type => '%{[@metadata][type]}'
user => 'logstash'
password => '${LOGSTASH2ELASTICSEARCH}'
However, due to the incompletion of the data, the index wrong index cannot be created (and also because I have given rules to the user logstash
so that it can only create certain types of index).
Now, logstash will repeating this action like following and blocking itself outputting to elasticsearch, which in turn blocks all other outputs.
[2017-11-15T11:27:26,293][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"security_exception", "reason"=>"action [indices:admin/create] is unauthorized for user [logstash]"})
I am think could we have a mechanism or special output plugin which will rescue from the output failure.
For example, the previous blocking is caused by wrong indexing. Then what we can have could be like this:
try {
elasticsearch {
hosts => ['hostname']
ssl => true
cacert => '/etc/logstash/certs/root.cer'
index => '%{[@metadata][beat]}-%{+YYYY.MM}'
document_type => '%{[@metadata][type]}'
user => 'logstash'
password => '${LOGSTASH2ELASTICSEARCH}'
}
} rescue {
elasticsearch
{
hosts => ['hostname']
ssl => true
cacert => '/etc/logstash/certs/root.cer'
index => 'failure-%{+YYYY.MM}'
document_type => 'failure'
user => 'logstash'
password => '${LOGSTASH2ELASTICSEARCH}'
}
}
Then, the failed data will be reindexed to index failure-%{+YYYY.MM}
. And no outputs will be blocked and we also recorded what are the data causing such failure.
Maybe it's similar to this:
if "_jsonparsefailure" in [tags] {
elasticsearch {
hosts => ['hostname']
ssl => true
cacert => '/etc/logstash/certs/root.cer'
index => 'failure-%{+YYYY.MM}'
document_type => 'failure'
user => 'logstash'
password => '${LOGSTASH2ELASTICSEARCH}'
}
}
This is more like you know what are the filter plugin failures and you act based on them at output part.
But the rescue is the action on output plugins.
Should I open another issue for discussing output failure rescue?
I am think could we have a mechanism or special output plugin which will rescue from the output failure.
We have such a mechanism today. It's too hard to use, but it exists. This mechanism is called the dead letter queue (DLQ). https://www.elastic.co/guide/en/logstash/current/dead-letter-queues.html
We only currently deliver things to the DLQ if it is something we consider "permanently undeliverable" -- and there are basically only two cases for this: One, on a mapping exception from ES which is unfixable without destroying data. Two, when an index is closed (which is debatable, since you can open the index), but is at least a property of the event.
I dont' think I would consider doing 403s to DLQ by default. Maybe we can make it configurable, but never by default. If you want such a feature, please open an issue on the logstash-output-elasticsearch repo :)
Even though, as mentioned here by @jordansissel :
output "failing" is a very complex subject. Failure is subjective for everyone.
But as far as i am concerned, this should be a design matter at the first place of Logstash, i.e., is it more reasonable to make each plugins working separately, asynchronously, and "reactively", so that one's failure won't have impact on others?
I am wondering how the Logstash plugins (input/filter/output) work.
- Are they working as different threads? If so, one output plugin should not be blocking others.
- Are they working asynchronously or reactive? If so, there also should not be blocking between different plugins.
Thanks :)
having a similar issue with elasticsearch and kinesis-plugins. If the elasticsearch-plugin fails(e.g. because the es-cluster is unavailable), then no data is delivered to kinesis too.
having the same issue with kafka and tcp output plugins, if kafka plugin fails, then no data is delivered to fluentd over tcp.
output {
kafka {
...
}
tcp {
host => "kafka-fluentd"
port => 24224
codec=> "json"
}
}
https://github.com/elastic/logstash/pull/9225 may address the concerns raised in this thread. Would that approach be useful to those of you facing this issue?
Has there ever been a solution that fixes this issue? One failure blocks all other outputs and tends to fill up logs rather quickly.
eg.
[2019-09-26T13:47:25,865][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"X.X.X.X", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "X.X.X.X" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:135:in initialize'", "org/jruby/RubyIO.java:876:in
new'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:209:in connect'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:177:in
publish'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:50:in encode'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:147:in
receive'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in block in multi_receive'", "org/jruby/RubyArray.java:1792:in
each'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:118:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:101:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:239:in
block in start_workers'"], :event=>#LogStash::Event:0x2ba1184a}
[2019-09-26T13:47:26,872][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"X.X.X.X", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "X.X.X.X" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:135:in initialize'", "org/jruby/RubyIO.java:876:in
new'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:209:in connect'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:177:in
publish'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:50:in encode'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:147:in
receive'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in block in multi_receive'", "org/jruby/RubyArray.java:1792:in
each'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:118:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:101:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:239:in
block in start_workers'"], :event=>#LogStash::Event:0x2ba1184a}
[2019-09-26T13:47:27,878][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"X.X.X.X", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "X.X.X.X" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:135:in initialize'", "org/jruby/RubyIO.java:876:in
new'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:209:in connect'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:177:in
publish'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:50:in encode'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:147:in
receive'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in block in multi_receive'", "org/jruby/RubyArray.java:1792:in
each'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:118:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:101:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:239:in
block in start_workers'"], :event=>#LogStash::Event:0x2ba1184a}
[2019-09-26T13:47:28,885][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"X.X.X.X", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "X.X.X.X" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:135:in initialize'", "org/jruby/RubyIO.java:876:in
new'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:209:in connect'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:177:in
publish'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:50:in encode'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:147:in
receive'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in block in multi_receive'", "org/jruby/RubyArray.java:1792:in
each'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:118:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:101:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:239:in
block in start_workers'"], :event=>#LogStash::Event:0x2ba1184a}
[2019-09-26T13:47:29,892][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"X.X.X.X", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "X.X.X.X" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:135:in initialize'", "org/jruby/RubyIO.java:876:in
new'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:209:in connect'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:177:in
publish'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:50:in encode'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:147:in
receive'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in block in multi_receive'", "org/jruby/RubyArray.java:1792:in
each'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:118:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:101:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:239:in
block in start_workers'"], :event=>#LogStash::Event:0x2ba1184a}
[2019-09-26T13:47:30,898][WARN ][logstash.outputs.syslog ] syslog tcp output exception: closing, reconnecting and resending event {:host=>"X.X.X.X", :port=>514, :exception=>#<Errno::ECONNREFUSED: Connection refused - connect(2) for "X.X.X.X" port 514>, :backtrace=>["org/jruby/ext/socket/RubyTCPSocket.java:135:in initialize'", "org/jruby/RubyIO.java:876:in
new'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:209:in connect'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:177:in
publish'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-line-3.0.8/lib/logstash/codecs/line.rb:50:in encode'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-syslog-3.0.5/lib/logstash/outputs/syslog.rb:147:in
receive'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in block in multi_receive'", "org/jruby/RubyArray.java:1792:in
each'", "/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:89:in multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:118:in
multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:101:in multi_receive'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:239:in
block in start_workers'"], :event=>#LogStash::Event:0x2ba1184a}
@BobTheBuilder7828 Look into pipeline-to-pipeline communication, as this allows you to create discrete pipelines per output.
Is there any way to "throttle" the error output and/or retry delta? Either have a pre-determined/set period for a retry or have it back off meaning 30s, 60s, 120s, etc ... as to not pollute logs so badly?
Exponential back-off settings will depend on the plugin. You can't disable or throttle the warning messages, but you might be able to use an API call to set the level to ERROR rather than INFO. See https://www.elastic.co/guide/en/logstash/current/logging.html for information on that.
Honestly, I'd be more concerned that port 514 is unavailable on the remote side. That's a pretty standard service (syslog), and it being unavailable is an abnormality that should be logged pretty heavily. This is why for a syslog output, Logstash does not throttle retries or error messages. It's a tcp port that is expected to be open and remain open at all times.
Yeah, I agree … just planning for if/when an output host is offline for some reason … do not want it to kill the entire thing.
Thank you for your replies.
Sadly I'm facing the same issue. In my output plugin, there is if-else condition to output to our internal Elasticsearch server or external Kafka server based on tag of document. But if Kafka failed (lost connection, Kafka broker not available), all output to my internal Elasticsearch also stopped working.
Sending to multiple pipelines (basically replicating the data), then acting on those individual pipelines (even if they are doing the same operation, just to a different host) was the only way I was able to get around this issue. One failed host for output kills the whole works unless you break it apart (pipeline-to-pipeline) as outlined above.
Sending to multiple pipelines (basically replicating the data), then acting on those individual pipelines (even if they are doing the same operation, just to a different host) was the only way I was able to get around this issue. One failed host for output kills the whole works unless you break it apart (pipeline-to-pipeline) as outlined above.
It worked. But logstash said that, 1 pipeline worker per CPU core. So it's not a feasible solution if you have multiple pipepines.
That number is only a default setting. You can dial it up higher manually.
Also, logstash dont start if output cannot connect (by example, to a RabbitMQ server). Using pipeline To pipeline, to seperate input from output, dont work! input is starting, but logstash HTTP API dont, whereas we use this endpoint to make Docker contianer ready.
I would like to control better pipeline flow:
What about a new setting input_always_up
:
-
false
==> actual behavior -
true
==> logstash start anyway, then queue events until output is OK.
Also, in case of mulitple outputs, another new setting to configure behavior, output_error_behavior
:
-
output_queue_if_single_error
==> queue per output, send to other(s) -
output_skip_error
==> ignore output error, send to other(s) -
output_queue_all
==> actual behavior
Or, option in pipeline definition, as such:
input { http {}}
output {
kafka {
ignore_failure => false
stop_all_outputs => true
}
tcp {
ignore_failure => true
}
}