nagios-plugins-linux
nagios-plugins-linux copied to clipboard
[Support] Information about `check_memory` vendored into Debian's `nagios-plugins-contrib`
Dear Davide,
first things first: Thanks a stack for conceiving and maintaining this collection of excellent check programs.
Introduction
Because this is absolutely not a bug report related to your software at all, please apologize for not using the issue template. It is merely a report where we want to share some observations and caveats when operating Icinga on Debian machines and upgrading to newer releases like Debian bullseye. In this manner, we are humbly asking you to label this issue with info
, support
or similar and close it right away if you see fit.
Observations
It looks like the nagios-plugins-linux
Debian source package recently vendored your monitoring programs ^1, effectively replacing its previous check_memory
program. It happened with those commits:
- Initial adding of madrisan's nagios-plugins-linux (44c2d26)
- Retire old check_memory in favour of madrisan-nagios-plugins-linux's version (eadc785)
@Flowdalic already reported that to the Debian bugtracker at #1005908:
In version 35
check_memory
is a Perl script and the arguments are the low-water mark of free memory, after which an warning or critical result will be reported. In version 37check_memory
is an ELF and the arguments are the high-water mark of used memory after which a warning or critical result will be reported. [...] Hence, invoking the program with the previous parameters will yield different results depending on the installed version. This is clearly not desirable and would require that users need to adjust their monitoring setup to work around the behavior change.
So, effectively, for Debian users transitioning to nagios-plugins-contrib >= 36.20211207
, the check semantics will be inverted. This is a somewhat unfortunate change for them.
Mitigations
As it is not really a bug report (your software works perfectly fine at all!) but more a support recipe for the community, our main aim is to share the steps we took to mitigate this problem on an Icinga instance monitoring systems running both old and new versions.
So, to all people using nagios-plugins-contrib
or monitoring-plugins-contrib
on Debian: The check_memory
program by @madrisan has the --available
option which should be used in this context of making the program's behavior compatible with the previous variant written in Perl. However, the program's --warning|--critical
threshold control options will also need slight adjustments.
Effectively, if that command would have worked before:
check_memory --warning "20%" --critical "10%"
people should now use an invocation like that:
check_memory --available --warning "20%:" --critical "10%:"
Closing words
We hope you appreciate our report here despite the report should go to Debian instead. We will check how to respond to the original ticket at #1005908 properly, referencing this issue. If someone from the community can also do that, we will be more than happy.
With kind regards, Andreas.
After knowing about the different call signature, the challenge was to adjust the Icinga configuration appropriately to support both variants. As we didn't find a way to inquire the information at runtime which package version would be installed, like
$ dpkg-query --show --showformat '${Package} ${Version}\n' monitoring-plugins-contrib
monitoring-plugins-contrib 37.20220312
we decided to introduce and propagate another flag to from the host object to the service object and further to the command object. Based on this flag, the call signature to the check_memory
program is adjusted at runtime.
Hereby, we are sharing the corresponding configuration snippets. Hopefully, they will be helpful to all people who are in need of them.
object Host "foobar.example.org" {
vars.os = "Debian"
vars.kernel = "Linux"
// Newer shipments of `check_memory` have a different call signature.
// Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
vars.memory_madrisan = true
}
apply Service "Main memory" {
check_command = "memory"
assign where host.vars.kernel == "Linux"
ignore where host.vars.check_memory == false
// Define thresholds.
vars.warning = host.vars.memory_warning || "20%"
vars.critical = host.vars.memory_critical || "10%"
// Newer shipments of `check_memory` have a different call signature.
// Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
vars.madrisan = host.vars.memory_madrisan
if (vars.madrisan) {
vars.warning += ":"
vars.critical += ":"
}
}
object CheckCommand "memory" {
/*
Universal memory check command accounting for different
variants of »check_memory« installed on the system.
Newer shipments of `check_memory` have a different call signature.
Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
*/
import "plugin-check-command"
command = [ "/usr/lib/nagios/plugins/check_memory" ]
arguments = {
"--available" = {
description = "display the free/available memory"
set_if = "$madrisan$"
}
"--warning" = {
value = "$warning$"
set_if = {{ if ( len(macro("$warning$")) != 0 ) { return 1 } else { return 0 } }}
description = "Trigger warning when available memory is lower than X"
}
"--critical" = {
value = "$critical$"
set_if = {{ if ( len(macro("$critical$")) != 0 ) { return 1 } else { return 0 } }}
description = "Trigger critical when available memory is lower than X"
}
}
}
Another option would have been not to swap in the program under the same name ^1 while retiring the old one. In this case, the program would be installed as /usr/lib/nagios/check_madrisan_memory
and could have been handled a bit better in transitioning scenarios. However, this suggestion is really meant to be addressed to the Debian maintainers.
Dear @Flowdalic,
we can see that you acknowledged our original post, thank you. Can we humbly ask you to forward the relevant information to the Debian issue tracker if you can find some minutes? Maybe it will be enough to refer to this ticket?
Thank you in advance and with kind regards, Andreas.
Thanks for your report @amotl I was not aware of this commit in Debian :+1: So, please tell me if I can do something for enhancing the compatibility with the previous Debian package. Perhaps a compilation switch?
Dear Davide,
thank you for your kind reply on this issue. We just pinged @bzed and @waja about it at https://github.com/bzed/pkg-nagios-plugins-contrib/issues/93, trying to condense the relevant information further.
We would like to have them and Peter Palfrader have their verdict what would be best for the Debian vendoring as we are not accustomed to the corresponding details. However, we wanted to share our thoughts on the two different aspects which make the new implementation of check_memory
incompatible with the old one, from the perspective of an average Debian user, and how to improve the situation slightly.
1. Threshold value evaluation inversion
Perhaps a compilation switch?
We don't know about a good solution to the aspect of the --available
option yet. While a compilation switch would be able to essentially invert the operation of your check_memory
program, wouldn't that be very confusing for users in the long run to find your program installed on their machines in two different variants? Currently, we see two options.
1.1. Messy workaround
Maybe it will be sufficient to advise adding this snippet to the CheckCommand configuration section?
"--available" = {
description = "display the free/available memory"
set_if = "$madrisan$"
}
However, people who are not aware of this will experience "silent corruption".
1.2. Break things instead of screwing things up
Like mentioned at https://github.com/madrisan/nagios-plugins-linux/issues/104#issuecomment-1082445888, a safer option would be to adjust the retirement commit https://github.com/bzed/pkg-nagios-plugins-contrib/commit/eadc7856 to ship the program as /usr/lib/nagios/check_madrisan_memory
like before, while still removing the old Perl implementation. At least this will signal users that something has changed instead of screwing up the sensor integrity.
2. Threshold value syntax
The other aspect are the additional :
colon characters on the threshold values, currently needed by @madrisan's implementation. We compensated for this with that snippet of Icinga configuration initially shared at https://github.com/madrisan/nagios-plugins-linux/issues/104#issuecomment-1082439499:
// Newer shipments of `check_memory` have a different call signature.
// Needed for {monitoring,nagios}-plugins-contrib >= 36.20211207.
vars.madrisan = host.vars.memory_madrisan
if (vars.madrisan) {
vars.warning += ":"
vars.critical += ":"
}
please tell me if I can do something for enhancing the compatibility
If it would fit the semantics of your program, would it be possible to make those colon characters optional? In this manner, we could reduce the footprint of the workaround at least slightly.
With kind regards, Andreas.
As described in the official nagios documentation, the two syntaxes are not equivalent: 20:
means less than 20 while 20
would mean in our context greater than 20.
Note also that the range is parsed by the function parse_range_string
which is the same coded in the nagios plugins.
So I cannot make the :
optional.
Hii, I am in the same boat, migration hosts from Debian 11 to Debian 12, and now the memory checks fail. But, with a little ternary shell logic things can work on both versions.
Instead of this:
/usr/lib/nagios/plugins/check_memory -w 20% -c 10%
we now use this:
/usr/lib/nagios/plugins/check_memory "$(( $(lsb_release -rs) < 12 )) && echo -w 20% -c 10% || echo -a -w 20%: -c 10%:"
We use the check_by_ssh wrapper, so in reality the command_line
looks like this:
command_line /usr/lib/nagios/plugins/check_by_ssh -q -t 60 -H '$HOSTADDRESS$' '/usr/lib/nagios/plugins/check_memory "$(( $(lsb_release -rs) < 12 )) && echo -w $ARG1$ -c $ARG2$ || echo -a -w $ARG1$: -c $ARG2$:"'
Hi, thanks for the report. So weird. I'll check ASAP.
Can you open a new issue please?
Can you open a new issue please?
My comment/workaround was more meant for reference by others that start using Debian 12 and end up in the same situation. The problem is not so much with this/your script, but rather with the choice of Debian to move from one script to another (incompatible) script.
The problem is not so much with this/your script, but rather with the choice of Debian to move from one script to another (incompatible) script.
It's not incompatible, but you might need to parameter it in another way. And yes, it requires to adjust your configuration.