icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

Notfication scripts exiting with "Argument list too long"

Open sitzmoebelchronograph opened this issue 2 years ago • 7 comments

Bug description

The notifications do not work if there are too many arguments passed when the error output that Icinga passes to the mail is too long.

Reproduce

  • Create a check like check_logfile that checks /tmp/foobar for ERROR. The check itself doesn't matter, it can happen with any check.
  • Create the file /tmp/foobar
  • Fill /tmp/foobar with a line starting with ERROR to trigger the check and then insert for example 100000 random characters after the ERROR and make a space every 100 characters.
  • Icinga will now check the file and the attempt to send a message will fail with "terminated with exit code 128, output: execvpe(/etc/icinga2/scripts/mail-service-notification.py) failed: Argument list too long

The problem here is that the operating system determines how many characters arguments can have. NOTE: THIS IS NOT A PROBLEM WITH NOTIFICATION SCRIPTS, IT DEPENDS ON THE FACT HOW LINUX WORK.

Cause

The problem is, Icinga passes the whole unabridged error to the notification script, but then the OS comes along and says "no no, that's too much I'm on strike" and aborts the script with the above error.

Affected versions

  • all since at least 2.11 (we didn't check further back)

Suggested solution

Passed errors should be shortened by Icinga directly before the call is passed to the OS.

sitzmoebelchronograph avatar Apr 12 '22 12:04 sitzmoebelchronograph

I can confirm this issue. This is an important problem on our customer setups.

tuxracer1337 avatar Apr 12 '22 14:04 tuxracer1337

We had a similar problem but we restricted the output as it was also filling the binlogs of the DB. I guess we did improve the quoting and didn't produce the "Argument list too long" error, as we did replace the notification objects (use the config of the host if none is specified for the service) and the notification scripts (we wanted a specific look) but besides filling the DB Logs we got error messages from our MTA in the Icinga Log that the notifications were to long to send.

If I would really need to get the output, I would rewrite the notification script to get the check's output from the DB and wouldn't pass it via command line arguments also I would compress it and send it as attachment but even then if you want to send it to external mail servers if the messages reaches over ~10-20 MB you are done.

Long story short - sooner or later you will suffer if the messages aren't sparse and on point! I would advice to rewrite the check plugin before rewriting the notifications scripts. BTW, I have seen the included notification objects and scripts as a code example anyway.

slalomsk8er avatar Apr 12 '22 19:04 slalomsk8er

We had a similar problem but we restricted the output as it was also filling the binlogs of the DB. I guess we did improve the quoting and didn't produce the "Argument list too long" error, as we did replace the notification objects (use the config of the host if none is specified for the service) and the notification scripts (we wanted a specific look) but besides filling the DB Logs we got error messages from our MTA in the Icinga Log that the notifications were to long to send.

If I would really need to get the output, I would rewrite the notification script to get the check's output from the DB and wouldn't pass it via command line arguments also I would compress it and send it as attachment but even then if you want to send it to external mail servers if the messages reaches over ~10-20 MB you are done.

Long story short - sooner or later you will suffer if the messages aren't sparse and on point! I would advice to rewrite the check plugin before rewriting the notifications scripts. BTW, I have seen the included notification objects and scripts as a code example anyway.

I see the part of the Binlog problematic too BUT should I rewirte ton's of checkscripts? I guess this is not the right way for us ;)

So I guess it's better that icinga e.g. only collects the first 1024 chars of the error message from check results to limit the space in the database and to preventing the "Argument list is too lung" issue for the notification scripts.

tuxracer1337 avatar Apr 13 '22 08:04 tuxracer1337

You are right, it would be nice of Icinga if it would try to protect the DB, RAM and what ever could fail down steam.

Since I reduced the verbosity of the above mentioned check, Icinga didn't get killed because of OOM yet!

How about, after extracting the perfdata, limit the collection of the output to expr $(getconf ARG_MAX) / 2 / 4 by default? Where divided by 2 is the margin and divided by 4 is to make it UTF-8 save. This would allow for a margin for hostname, servicename, state, perfdata and whatever else one needs to pass to the notification scripts.

slalomsk8er avatar Apr 13 '22 10:04 slalomsk8er

you could modify the notification command to use icingadsl on $notification_output$ and use substr(start, len) to limit the output

moreamazingnick avatar Jul 12 '22 12:07 moreamazingnick

Good idea, @moreamazingnick!

Colleagues, shall we do this?

Al2Klimov avatar Jul 19 '22 17:07 Al2Klimov

sounds also good for me :)

sitzmoebelchronograph avatar Jul 20 '22 08:07 sitzmoebelchronograph

We ran into this issue recently. Notifications had been quiet for a few days, after logging into Icinga2 - surprise! - there are many notifications that have been "sent" but have never received.

Icinga2 log shows the same error: terminated with exit code 128, output: execvpe(/etc/icinga2/scripts/mail-service-notification.sh) failed: Argument list too long

It would have been nice if a separate email was triggered by Icinga, something along the lines of "notification failed".

SpikedCola avatar Jan 13 '23 20:01 SpikedCola