logstash-filter-verifier icon indicating copy to clipboard operation
logstash-filter-verifier copied to clipboard

Exceptions in Logstash do not necessarily lead to exit code != 0

Open breml opened this issue 7 years ago • 5 comments

TL;DR I suggest to not only check the exit code of Logstash but also take the Logstash log file into account to decide, if the Logstash run was successful, especially if the test cases have failed.

Problem I had a strange problem with logstash-filter-verifier where to processing failed completely, but I got no helpful debug output.

Output of logstash-filter-verifier:

Running tests in 000.json...
Testcase failed, continuing with the rest: Expected 7 event(s), got 0 instead.

Finally I found out, that the problem was, that the path to the grok patterns was wrong. Logstash wrote this error (exception) to the log file, but because logstash did exit with exit code 0, logstash-filter-verifier did not output the log and also deleted the file after the run (expected behavior). So there was no place to check, why this run has failed. This made the debugging quite hard.

Example (using logstash 2.4.1):

echo "test" | logstash -e 'input { stdin { } } filter { grok { match => { "message" => [ "%{GREEDYDATA:alt_message}" ] } patterns_dir => [ "/tmp/bla" ] } } output { stdout {} }' --log ./test.log

In the above example Logstash exits with exit code 0, the Exception is written to the log file.

breml avatar Jan 26 '17 16:01 breml

This is arguably a bug in Logstash, but if we can work around it that'd be great. Have you given any thought regarding how to deal with this? Would it be enough to assume that getting zero events in return actually signals an error?

magnusbaeck avatar Feb 14 '17 20:02 magnusbaeck

I am not sure, if it would be be considered a bug by the Logstash team, that logstash does not exit with a non 0 exit code in this case. The argument in my opinion would be, that Logstash is intended as a long running process. So if a filter has a problem with an (single) event, to rise an exception and log it to the log file is the right thing to do. In this case I would not expect Logstash to keep track of the fact, that at some point a filter failed and therefore the exit code (at the end, when Logstash is finally exiting) should be non zero.

The root cause of this particular example (there might be other cases, where Logstash does exit with a 0 exit code, where there have been exceptions during the log processing) is, that the grok plugin does not check at startup (method register), if all grok patterns for the provided grok filters are available (or in other words, if all the grok filters do compile). This is something, which should be changed in the logstash-filter-grok plugin.

For our particular problem, I think it would make sense to evaluate the output and the log file in every case (not only if the exit code is non 0). Because the above mentioned problem was hard for me to debug, I suggested to add the -logstash-output flag in #18 (but this is kind of an workaround). For a start we may output the content of the log, if we get no events or print a warning, which suggest to the user to add the above mentioned flag -logstash-output.

breml avatar Feb 15 '17 07:02 breml

I had an other look on this issue. My above analysis is wrong.

The exception is thrown in the registermethod of the plugin and therefore you (@magnusbaeck) are right, logstash should exit with an non zero exit code. I opened https://github.com/elastic/logstash/issues/6712

breml avatar Feb 15 '17 08:02 breml

According to https://github.com/elastic/logstash/issues/6712#issuecomment-320282308 this should be fixed in Logstash. But I did not test it again.

breml avatar Nov 20 '20 16:11 breml

Yes, Logstash behaves correctly but LFV doesn't. Here's 2.4.1 for reference:

$ ~/src/logstash-filter-verifier/src/github.com/magnusbaeck/logstash-filter-verifier/logstash-filter-verifier --sockets roles/logstash-hub/templates/logstash/tests roles/logstash-hub/templates/logstash/*-filter.conf.j2
Use Unix domain sockets.
☐ Compare actual event with expected event from client-decoration.yaml:
Expected 2 event(s), got 0 instead.
☐ Compare actual event with expected event from host-decoration.yaml:
Expected 2 event(s), got 0 instead.
☐ Compare actual event with expected event from http-access.yaml:
Expected 2 event(s), got 0 instead.
☐ Compare actual event with expected event from syslog.yaml:
Expected 2 event(s), got 0 instead.
☐ Compare actual event with expected event from tellduslivebeat.yaml:
Expected 4 event(s), got 0 instead.

Summary: ☐ All tests: 0/5
	 ☐ client-decoration.yaml: 0/1
	 ☐ host-decoration.yaml: 0/1
	 ☐ http-access.yaml: 0/1
	 ☐ syslog.yaml: 0/1
	 ☐ tellduslivebeat.yaml: 0/1

6.0.0, still with --sockets:

$ ~/src/logstash-filter-verifier/src/github.com/magnusbaeck/logstash-filter-verifier/logstash-filter-verifier --sockets --logstash-path ~/logstash/logstash-6.0.0/bin/logstash roles/logstash-hub/templates/logstash/tests roles/logstash-hub/templates/logstash/*-filter.conf.j2 
Use Unix domain sockets.
2020/11/24 21:47:16 Error while accept unix socket: accept unix /tmp/717989705/socket: use of closed network connection
2020/11/24 21:47:16 Error while accept unix socket: accept unix /tmp/845580387/socket: use of closed network connection
2020/11/24 21:47:16 Error while accept unix socket: accept unix /tmp/548128397/socket: use of closed network connection
2020/11/24 21:47:16 Error while accept unix socket: accept unix /tmp/917722183/socket: use of closed network connection
2020/11/24 21:47:16 Error while accept unix socket: accept unix /tmp/195550225/socket: use of closed network connection
Write timeout error

I guess that's a slight improvement but it's still bad. Same with 7.0.1:

$ ~/src/logstash-filter-verifier/src/github.com/magnusbaeck/logstash-filter-verifier/logstash-filter-verifier --sockets --logstash-path ~/logstash/logstash-7.0.1/bin/logstash roles/logstash-hub/templates/logstash/tests roles/logstash-hub/templates/logstash/*-filter.conf.j2 
Use Unix domain sockets.
2020/11/24 21:48:35 Error while accept unix socket: accept unix /tmp/126956035/socket: use of closed network connection
2020/11/24 21:48:35 Error while accept unix socket: accept unix /tmp/357424941/socket: use of closed network connection
2020/11/24 21:48:35 Error while accept unix socket: accept unix /tmp/519125479/socket: use of closed network connection
2020/11/24 21:48:35 Error while accept unix socket: accept unix /tmp/205691057/socket: use of closed network connection
2020/11/24 21:48:35 Error while accept unix socket: accept unix /tmp/123809803/socket: use of closed network connection
Write timeout error

However, if we drop --sockets we get a reasonable error message (except for the ridiculous amount of Logstash log output):

$ ~/src/logstash-filter-verifier/src/github.com/magnusbaeck/logstash-filter-verifier/logstash-filter-verifier --logstash-path ~/logstash/logstash-7.0.1/bin/logstash roles/logstash-hub/templates/logstash/tests roles/logstash-hub/templates/logstash/*-filter.conf.j2           
Running tests in client-decoration.yaml...
Error running Logstash: exit status 1.
Process output:
Sending Logstash logs to /tmp/066537891/log which is now configured via log4j2.properties
[2020-11-24T20:48:56,286][DEBUG][logstash.modules.scaffold] Found module {:module_name=>"fb_apache", :directory=>"/home/magnus/logstash/logstash-7.0.1/modules/fb_apache/configuration"}
...
[2020-11-24T20:49:10,151][DEBUG][logstash.agent           ] Converging pipelines state {:actions_count=>0}
[2020-11-24T20:49:10,153][INFO ][logstash.runner          ] Logstash shut down.

It looks like the --sockets code path more or less ignores the health of the Logstash process.

magnusbaeck avatar Nov 24 '20 21:11 magnusbaeck